We construct here a abundance correlation based network via WGCNA on CLR transformed 16S data and visualize different aspects in integrated heatmaps and direct correlation to environmental and physiological parameters. The Output is directly handed over to Cytoscape which are # here to allow for knitting Picurst was run on an HPC but seems not informative with many undescribed or lowly described bacterial taxa in this estuarine fish gill dataset.

1 Global Setup

1.1 Path

before you start: .rs.restartR()

1.2 Packages

1.2 Functions

1.4 Input Files

1.5 Tutorials

1.6 Setup Analysis

#-

9 Overall Network

Some seasonal networks as conststuced here:

https://journals.asm.org/doi/full/10.1128/spectrum.02943-23

Finally, three metrics have been extracted from the networks with the function “Network Analyzer” in Cytoscape: degree (DG), neighborhood connectivity (NC), and closeness centrality (CC). Those network topological parameters allow a better understanding of the dynamics in different networks. DG is the number of edges connected from one node to another (184). The more a node has edges, the more it is locally connected, indicating its relevance in the network (185). CC is a qualitative measure that indicates if a node is close to the other nodes in the network (186). The shortest path length to spread information from one node to another is represented, and it indicates if the node has an important influence in the network and how it can interact with other nodes (184). Finally, NC is a quantitative measure that calculates the average connectivity of a node to the other nodes in the network. It indicates how the node impacts in the network dynamics (187).

works, just un-# the whole chunk

closeness centrality (CC). CC is a qualitative measure that indicates if a node is close to the other nodes in the network (186). The shortest path length to spread information from one node to another is represented, and it indicates if the node has an important influence in the network and how it can interact with other nodes (184).

Finally, NC is a quantitative measure that calculates the average connectivity of a node to the other nodes in the network. It indicates how the node impacts in the network dynamics (187).

9.2 WGCNA

9.2.1 Setup

WGCNA_list <- list()
for (Dataset in c(names(pslist)[!grepl("WF", names(pslist))][grepl("ps_GC|ps_OE", names(pslist)[!grepl("WF", names(pslist))])], "ps_WF")) {
  
  require(WGCNA)
  require(plyr)
  require(dplyr)
  require(ggrepel) 
  require(cowplot)
  require(phyloseq)
  
  Datasetname <- sub("ps_", "", Dataset)
  
  ps <- pslist[[Dataset]]
  psraw <-pslistraw[[Dataset]]
  
  WGCNA_list_length <- length(WGCNA_list)
  
  Analysis   <- "WGCNA"
  prefix <- "SSU-"

  if (Datasetname %in% c("OE", "GC")) {
  
    traitData <- c(
  "HSI", "SSI", "GSI", "FCF", "FillLevel", "Age", "Length", 
  "NH4", "NO2", "NO3", "O2", "PO4", "TOC", "Temp",  "SPM", "Salinity"
  
  ) } else if (Datasetname  %in% c("WF")) {
  
    traitData <- c(
  "NH4", "NO2", "NO3", "O2", "PO4", "TOC", "Temp",  "SPM", "Salinity"
  ) }
  
  omics_data0 <- as.data.frame(otu_table( microbiome::transform(ps, "clr"))) #%>% t()
  
  WGCNA_list[[Datasetname]]                  <- list()
  WGCNA_list[[Datasetname]][["plots"]]       <- list()
  WGCNA_list[[Datasetname]][["ps"]]          <- pslist[[Dataset]]
  WGCNA_list[[Datasetname]][["psraw"]]       <- pslistraw[[Dataset]]
  WGCNA_list[[Datasetname]][["omics_data0"]] <- omics_data0
  WGCNA_list[[Datasetname]][["traitData"]]   <- traitData 
}
## Loading required package: WGCNA
## Warning: package 'WGCNA' was built under R version 4.3.1
## Loading required package: dynamicTreeCut
## Loading required package: fastcluster
## Warning: package 'fastcluster' was built under R version 4.3.1
## 
## Attaching package: 'fastcluster'
## The following object is masked from 'package:stats':
## 
##     hclust
## 
## 
## Attaching package: 'WGCNA'
## The following object is masked from 'package:stats':
## 
##     cor
## Loading required package: plyr
## Warning: package 'plyr' was built under R version 4.3.1
## ------------------------------------------------------------------------------
## You have loaded plyr after dplyr - this is likely to cause problems.
## If you need functions from both plyr and dplyr, please load plyr first, then dplyr:
## library(plyr); library(dplyr)
## ------------------------------------------------------------------------------
## 
## Attaching package: 'plyr'
## The following objects are masked from 'package:dplyr':
## 
##     arrange, count, desc, failwith, id, mutate, rename, summarise,
##     summarize
## The following object is masked from 'package:purrr':
## 
##     compact
## Loading required package: ggrepel
## Warning: package 'ggrepel' was built under R version 4.3.1
## Loading required package: cowplot
## Warning: package 'cowplot' was built under R version 4.3.1
## 
## Attaching package: 'cowplot'
## The following object is masked from 'package:lubridate':
## 
##     stamp
## Loading required package: phyloseq

9.2.2 Pre-filtering & Transormation

Some comments on normalization: - Normalization: Strand et al., 2021 4.2.1. Data preprocessing Transcript expression counts were summed for each gene. Expression values were normalized across samples using the Trimmed mean of M values (TMM)-method [33], and log2- transformed. Genes with expression levels below 1.0 in all samples and/or with a standard deviation less than 0.15 were removed before network inference. This reduced the number of genes from 48,057 to 37,408. We retained OTUs that contribute at least 0.005% of the total microbial abundance. This reduced the number of OTUs from 1152 to 296 and the number of zeros in the abundance table by 50%. OTU abundances was normalized using the Cumulative Sum Scaling (CSS) method from the Bioconductor package metagenome- Seq, which by default log2-transforms the data.

Swift et al. 2021: Cumulative-Sum Score (CSS) uses robust statistics to provide an alternative to TSS that is less influenced by preferentially sampled taxa. Its normalization scaling factor is defined as the cumulative sum of the observed counts up to a threshold which is determined using a heuristic that minimizes the influence of the preferentially sampled taxa. The idea is that CSS scales each sample using only the part of the counts distribution that is relatively invariant (H. Lin &Peddada, 2020a; Weiss et al., 2017). However, the determination of the thresholds could fail due to high count variability (J. Chen et al., 2018). CSS or TSS do not account for sparsity or address compositionality.

The DESeq scaling factor of the observed abundances for each sample is computed as the median of all the ratios between counts of the sample and the reference. Assuming most taxa are not differentially abundant, the median of these ratios would provide an estimate of a correction factor for all the read counts. Like most normalizations, TMM and DESeq rely on the strong assumptions that most taxa are not differentially abundant, and of those differentially abundant, there is an approximate balanced amount of over-abundance and under-abundance (Calza & Pawitan, 2010; Dillies et al., 2012; Gloor et al., 2017; Weiss et al., 2017). These may be reasonable assumptions for gene expression data but may not be valid for microbiome data which tends to be diverse. Also, while TMM and DESEq normalization are designed for compositional data, they do not address sparsity. In fact, large fractions of zeros and relatively low sequencing depths can be problematic for both methods and can lead to serious biases (Kumar et al., 2018). As stated previously, adding pseudo-counts to the original count data will not rectify the problem.

Large scale tool benchmarking by Thorsen et al. has revealed that most of the commonly used differential (relative) abundance testing methods including edgeR and DESeq are sensitive to sparsity (Thorsen et al., 2016) and consequently exhibit unacceptably high false positive rates (Hawinkel et al., 2019). Also, recent investigations by H. Lin and Peddada (2020a, 2020b) indicate poor performance of these methods for microbiome data. However, based on simulations and analytical derivations, they attribute the poor performance to the violation of the assumption that most taxa do not change. As a result, the false discovery rate is inflated, and it increases as the sample size increases. Similar behavior of these methods was also seen by Weiss et al. (2017). Thus, H. Lin and Peddada (2020a) do not advocate the use of edgeR and DESeq for the analysis of microbiome data.

following http://mixomics.org/mixmc/mixmc-preprocessing/ Arumugam et al., (2011)

for (Datasetname  in names(WGCNA_list)) {
  
  psraw <- WGCNA_list[[Datasetname]]$psraw
  
  library(phyloseq)
  library(tidyverse)
  require(DESeq2)
  require(SummarizedExperiment)

  low.count.removal <- function(
                        data, # OTU count df of size n (sample) x p (OTU)
                        percent=0.005 # cutoff chosen
                        ) {
    keep.otu = which(colSums(data)*100/(sum(colSums(data))) > percent)
    data.filter = data[,keep.otu]
    return(list(data.filter = data.filter, keep.otu = keep.otu))}

  #######################################################
  #Plot consequence of filtering from Strand et al, 2021#
  #######################################################
  frac_zero <- c()
  all_sum   <- c()
  num_otu   <- c()
  seqp <- seq(0,0.1,0.0001)
  for(i in seqp){
  result.filter = low.count.removal(otu_table(psraw), percent=i)
  length(result.filter$keep.otu)
  otu.f = result.filter$data.filter
  frac_zero <- c(frac_zero, sum(otu.f == 00)/ (ncol(otu.f)*nrow(otu.f)))
  all_sum <- c(all_sum, sum(otu.f))
  num_otu <- c(num_otu, ncol(otu.f))
  }
  names(frac_zero) <- seqp
  names(all_sum) <- seqp

  applied_filter <- 0.005
  # Get all_sum and num_otu to a fraction of the total
  mas <- max(all_sum)
  mno <- max(num_otu)

  all_sum %>% (function(x){x/max(x)}) -> all_sum
  num_otu %>% (function(x){x/max(x)}) -> num_otu
  data.frame(filter = seqp, 
           Percent.zeros   = frac_zero*100, 
           Total.abundance = all_sum*100, 
           Number.of.OTUs  = num_otu*100) %>% 
  tidyr::gather(key = "Type", value = "percent.of.total", -filter) %>%
  ggplot() + 
  geom_line(mapping = aes(x = filter, y = percent.of.total, col = Type)) +
  geom_vline(xintercept = applied_filter, alpha = 0.5, color = "red", linetype="dashed")+
  ylab("Percent of total") +
  xlab(paste("Filter at", applied_filter)) +
  theme(legend.title = element_blank()) -> FilterZerosPlot

  WGCNA_list[[Datasetname]][["plots"]][["FilterZerosPlot"]] <- FilterZerosPlot
  
  plot(FilterZerosPlot)


  ##############
  #Plot clr PCR#
  ##############
  TSE <-mia::makeTreeSummarizedExperimentFromPhyloseq(microbiome::transform(WGCNA_list[[Datasetname]]$ps, "clr"))
  
  tryCatch({
  SSUclrPCAPlot<-pcaplotRK3(TSE,intgroup = c("Reps"), pcX = 1, pcY = 2, title="",ellipse = TRUE,     ellipse.prob = 0.5) + 
  scale_fill_manual(values=col.Palette$col.Palette.Reps) + #col.Palette.SeqCenter #col.Palette.Cruises
  scale_color_manual(values=col.Palette$col.Palette.Reps) + atheme +
  theme_minimal() + atheme + theme(axis.title.x = element_blank()) +
  theme(
        panel.grid.major = element_line(colour = "grey50"), 
        panel.grid.minor = element_line(colour = "grey50"))
  prow <- cowplot::plot_grid(SSUclrPCAPlot, labels = c(""), ncol = 1)
  title <- ggdraw() + draw_label_themeRKwhite(paste(Datasetname), element = "plot.title",x = 0.05, hjust = 0,    vjust = 1)
  subtitle <- ggdraw() + draw_label_themeRKwhite(paste("clr-PCA", "with", 
  length(rownames(TSE)),"bacterial species",sep = " "), element = "plot.subtitle",x = 0.05, hjust = 0,   
  vjust = 1)
  SSUclrPCAPlot<- plot_grid(title, subtitle, prow, ncol = 1, rel_heights = c(0, 0.05, 0.989))
  ggsave(SSUclrPCAPlot, filename = paste(paste(save_name, "SSU", "Filter-clrPCA", Datasetname, sep="_"),".png", sep=""), path = pathPlots , device='png', dpi=300, width = 7,
  height = 7)

  plot(SSUclrPCAPlot)
  
  }, error=function(e){cat("ERROR :",conditionMessage(e), "\n")})
  
  WGCNA_list[[Datasetname]][["plots"]][["SSUclrPCAPlot"]] <- SSUclrPCAPlot
  
}
## [1] "Adapted from Marini F, Binder H (2019). “pcaExplorer: an R/Bioconductor package for interacting with RNA-seq principal components.” BMC Bioinformatics, 20(1), 331. doi:10.1186/s12859-019-2879-1, https://bioconductor.org/packages/pcaExplorer/."

## [1] "Adapted from Marini F, Binder H (2019). “pcaExplorer: an R/Bioconductor package for interacting with RNA-seq principal components.” BMC Bioinformatics, 20(1), 331. doi:10.1186/s12859-019-2879-1, https://bioconductor.org/packages/pcaExplorer/."

## [1] "Adapted from Marini F, Binder H (2019). “pcaExplorer: an R/Bioconductor package for interacting with RNA-seq principal components.” BMC Bioinformatics, 20(1), 331. doi:10.1186/s12859-019-2879-1, https://bioconductor.org/packages/pcaExplorer/."

9.2.1 Data Input

for (Datasetname  in names(WGCNA_list)) {
  
  omics_data0 <- WGCNA_list[[Datasetname]]$omics_data0
  ps <- WGCNA_list[[Datasetname]]$ps
  traitData  <- WGCNA_list[[Datasetname]][["traitData"]]  
  
  #Import Data
  library(WGCNA)
  # The following setting is important, do not omit.
  options(stringsAsFactors = FALSE);

  dim(omics_data0 %>% paste(c("Samples", "OTUs")))
  gsg = goodSamplesGenes(omics_data0, verbose = 3);
  gsg$allOK

  #Check Data Quality
  if (!gsg$allOK)
  {
  # Optionally, print the gene and sample names that were removed:
  if (sum(!gsg$goodGenes)>0) 
     printFlush(paste("Removing genes:", paste(names(omics_data0)[!gsg$goodGenes], collapse = ", ")));
  if (sum(!gsg$goodSamples)>0) 
     printFlush(paste("Removing samples:", paste(rownames(omics_data0)[!gsg$goodSamples], collapse = ", ")));
  # Remove the offending genes and samples from the data:
  omics_data0 = omics_data0[gsg$goodSamples, gsg$goodGenes]
  }

  #Plot Euclidean Sample Distance
  sampleTree = hclust(dist(omics_data0), method = "average");
  # Plot the sample tree: Open a graphic output window of size 12 by 9 inches
  # The user should change the dimensions if the window is too large or too small.
  #pdf(file = "Plots/sampleClustering.pdf", width = 12, height = 9);
  par(cex = 0.6);
  par(mar = c(0,4,2,0))
  plot(sampleTree, main = "Sample clustering to detect outliers", sub="", xlab="", cex.lab = 1.5, 
     cex.axis = 1.5, cex.main = 2) 


  # Plot a line to show the cut
  abline(h = 100, col = "red");
  # Determine cluster under the line
  clust = cutreeStatic(sampleTree, cutHeight = 100, minSize = 10)
  table(clust)
  # clust 1 contains the samples we want to keep.

  #in Case we would exclude outliers from the tree: 
  #keepSamples = (clust==1)
  #omics_data = omics_data0[keepSamples, ]

  recordPlot() -> SampleClusteringTreePlot
  WGCNA_list[[Datasetname]][["plots"]][["SampleClusteringTreePlot"]] <- SampleClusteringTreePlot
  
  
  
  ###############################
  #Subset samples when excluding#
  ###############################
  
  #Keep all samples
  omics_data <- omics_data0
  nGenes = ncol(omics_data)
  nSamples = nrow(omics_data)

  #Clean TraitData
  datTraits <-SAMDF16S %>% 
    filter(SampleID %in% rownames(omics_data))
  rownames(datTraits) <- datTraits$SampleID
    
  datTraits<- datTraits[,traitData]
  print("NAs in TraitData")
  
  print(table(is.na(datTraits)))
  
  print((datTraits[!complete.cases(datTraits), ]))
  
  datTraits  <- datTraits[complete.cases(datTraits), ]

  dim(datTraits)

  omics_data <- omics_data[rownames(omics_data) %in% rownames(datTraits),]
  dim(omics_data)

  ps <- prune_samples(sample_names(ps)[sample_names(ps) %in% rownames(omics_data)], ps) 
  
  print(Datasetname)
  print(ps)
  
  collectGarbage()

  
  # Re-cluster samples
  sampleTree2 = hclust(dist(omics_data), method = "average")
  # Convert traits to a color representation: white means low, red means high, grey means missing entry
  traitColors = numbers2colors(datTraits, signed = FALSE)
  # Plot the sample dendrogram and the colors underneath.
  plotDendroAndColors(sampleTree2, traitColors,
                    groupLabels = names(datTraits), 
                    main = "Sample dendrogram and trait heatmap")
  recordPlot() -> SampleClusteringTreeTraitsPlot
  WGCNA_list[[Datasetname]][["plots"]][["SampleClusteringTreeTraitsPlot"]] <- SampleClusteringTreeTraitsPlot

  WGCNA_list[[Datasetname]][["omics_data"]] <- omics_data
  WGCNA_list[[Datasetname]][["datTraits"]] <- datTraits
  WGCNA_list[[Datasetname]]$ps <-  ps

}
##  Flagging genes and samples with too many missing values...
##   ..step 1

## [1] "NAs in TraitData"
## 
## FALSE 
##  2208 
##  [1] HSI       SSI       GSI       FCF       FillLevel Age       Length   
##  [8] NH4       NO2       NO3       O2        PO4       TOC       Temp     
## [15] SPM       Salinity 
## <0 rows> (or 0-length row.names)
## [1] "OE"
## phyloseq-class experiment-level object
## otu_table()   OTU Table:         [ 1552 taxa and 138 samples ]
## sample_data() Sample Data:       [ 138 samples by 61 sample variables ]
## tax_table()   Taxonomy Table:    [ 1552 taxa by 7 taxonomic ranks ]
## refseq()      DNAStringSet:      [ 1552 reference sequences ]

##  Flagging genes and samples with too many missing values...
##   ..step 1

## [1] "NAs in TraitData"
## 
## FALSE 
##  1408 
##  [1] HSI       SSI       GSI       FCF       FillLevel Age       Length   
##  [8] NH4       NO2       NO3       O2        PO4       TOC       Temp     
## [15] SPM       Salinity 
## <0 rows> (or 0-length row.names)
## [1] "GC"
## phyloseq-class experiment-level object
## otu_table()   OTU Table:         [ 1344 taxa and 88 samples ]
## sample_data() Sample Data:       [ 88 samples by 61 sample variables ]
## tax_table()   Taxonomy Table:    [ 1344 taxa by 7 taxonomic ranks ]
## refseq()      DNAStringSet:      [ 1344 reference sequences ]

##  Flagging genes and samples with too many missing values...
##   ..step 1

## [1] "NAs in TraitData"
## 
## FALSE 
##   171 
## [1] NH4      NO2      NO3      O2       PO4      TOC      Temp     SPM     
## [9] Salinity
## <0 rows> (or 0-length row.names)
## [1] "WF"
## phyloseq-class experiment-level object
## otu_table()   OTU Table:         [ 2364 taxa and 19 samples ]
## sample_data() Sample Data:       [ 19 samples by 61 sample variables ]
## tax_table()   Taxonomy Table:    [ 2364 taxa by 7 taxonomic ranks ]
## refseq()      DNAStringSet:      [ 2364 reference sequences ]

saveRDS(WGCNA_list, file = paste0(file.path(path_Output_16S, paste(paste(save_name, "WGCNA_list",  Date, sep="_"), ".rds", sep=""))))


##############
#Summary Plot#
##############
for (Datasetname  in names(WGCNA_list)) {
  plots <- WGCNA_list[[Datasetname]][["plots"]]
  
  cowplot::plot_grid(plots$FilterZerosPlot , plots$SSUclrPCAPlot, labels = c("A", "B"), ncol = 2, 
                     rel_heights = c(1,1)) -> part_1
  cowplot::plot_grid(part_1 , plots$SampleClusteringTreeTraitsPlot, labels = c("", "C"), ncol = 1) -> part_2
  ggsave(part_2, filename = paste(paste(save_name, "SSU_DataInputPlot", Datasetname, Date, sep="_"),".png", sep=""), path = pathPlots , device='png', dpi=300, width = 10,
  height = 12)
}

9.2.2 Network construction

Rosales et al., 2023 Network analysis To identify ASVs that co-associate in AH, DU, and DL samples, CLR-transformed counts were used for weighted correlation network analysis (WGCNA) with the WGCNA 1.70-3R package [76]. The network was constructed unsigned with the following parameters: power = 3, minimum module size = 12, deep split = 2, and merged cut height = 0.25. The eigenvalues were correlated to AH, DU, and DL using Pearson correlation with the R function cor. The highest correlation in each disease state was then selected for network construction using the R package SpiecEasi 1.0.5 [77]. The network was then constructed as previously reported [11]. Briefly, the Stability Approach to Regularization Selection (StARS) [77] model was chosen along with the method Meinshausen–Bühlmann’s neighborhood selection [78]. For StARS, 100 subsamples were used with a variability threshold of 10−3. The centrality (node importance) was evaluated [79] using the functions centrality_degree (neighbors = the number of adjacent edges or neighbors) and centrality_edge_betweenness (centrality = the number of shortest paths going through an edge) [80]. The package influenceR 0.1.0. [81] selected important ASVs in each network and assigned the top “key players” [38], which were labeled with their respective orders.

Jameson et al., 2023 Co-occurrence patterns between taxa with putative roles in N2O production and the rest of the microbial community ASVs were explored using proportionality analysis within the propr package52. ASV tables were trimmed to select taxa that occurred ≥10 times in at least 10% of samples prior to network-level analyses to improve interpretability and minimize the risk of spurious correlations. Pairwise interactions between individual taxa with rho values greater than 0.60 were plotted using Cytoscape v3.9.0 and network topological indices were calculated using the NetworkAnalyzer tool114. Relationships between microbial community structure and rate processes were then assessed using weighted gene correlational network analysis (WGCNA) performed with the WGCNA package115. The signed adjacency measure was first calculated for each pair of features (ASVs) by raising the absolute value of their pairwise correlation coefficients to a soft-thresholding power of 8 to maximize the scale-free topology fit. Hierarchical clustering of taxa into discrete subnetworks was completed using a minimum module size threshold of 20 and a dissimilarity threshold of 0.3. Pearson correlation coefficients and corresponding p-values are reported for correlations between sample traits, subnetwork eigengenes, and individual ASVs (Supplementary Data 1). Subnetwork membership and intranetwork connectivity measures are also reported for each ASV and were used in further analyses to assess broad relationships between ASV connectivity and importance with respect to N2O production rates.

Mach et al., 2021 To confirm the results of the DE analysis, the weighted gene co-expression network analysis (WGCNA) method was also run on the “E1” matrix using the WGCNA R package (version 1.69) (Langfelder and Horvath, 2008). The parameters for the analysis were set as follows: “corFnc” = bicor, “type” = signed hybrid, “beta” = 10, “deepSplit” = 4, “minClusterSize” = 30, “cutHeight” = 0.1. The eigengenes corresponding to each identified module were correlated individually to all the 1H NMR and biochemical assay metabolites, i.e., a set of 56 different molecules (see next paragraphs). A module was then considered positively or negatively associated to this set of molecules if the Pearson r correlation values were ≥ |0.65| for at least 5 molecules and if all the corresponding p-values were ≤ 1e−05. The positively and the negatively correlated modules defined in this way were merged to obtain a single gene list, which was subsequently compared to the differentially expressed genes (DEGs) list using a Venn diagram.

Strand et al, 2021 4.2.2. Network inference For network inference, we used the Weighted Gene Co-expression Network Analysis (WGCNA) R package [24] and the function blockwiseModules with the bicor correlation measure and parameters maxBlockSize = 10000, networkType = “signed”, TOMType = “signed”, corType = “bicor”, maxPOutliers = 0.05, replaceMissingAdjacencies = TRUE, pamStage = F, deepSplit = 4, minModuleSize = 2, minKMEtoStay = 0.5, minCoreKME = 0.5, minCoreKMESize = 2, reassignThreshold = 0 mergeCutHeight = 0.4/0.5 (for microbiota/host respectively).

Our analysis relies heavily on network modules, and hence the parameters related to module detection and trimming influence the results. Briefly, deepSplit controls the sensitivity of the module detection approach by hierarchical clustering, with a value of 1 being the least sensitive and 4 being the most sensitive. minModuleSize controls the minimum size of modules in the clustering step. Nodes with a correlation to the module eigennode (KME) lower than minKMEtoStay are trimmed from the module, and the module is deleted if it does not have a core of at least minCoreKMESize nodes (with core nodes being defined as having a KME greater than minCoreKME). Finally, different modules with eigennodes that correlate above the 1 – mergeCutHeight threshold are merged. Note that the final modules can be smaller than minModuleSize due to trimming (but not smaller than minCoreKMESize), and that they can include nodes with a KME lower than minKMEtoStay due to module merging. Our parameters were set to detect highly correlated and potentially small modules initially, thus not missing interesting profiles displayed by few genes/OTUs, and then to apply an aggressive merging threshold to avoid dealing with highly redundant modules in downstream analysis.

9.2.2.1 Pick Soft Threshold
for (Datasetname  in names(WGCNA_list)) {
  
  library(WGCNA)
  # The following setting is important, do not omit.
  options(stringsAsFactors = FALSE)
  
  omics_data <- WGCNA_list[[Datasetname]]$omics_data
  datTraits  <- WGCNA_list[[Datasetname]]$datTraits
  #Import Data

  # Choose a set of soft-thresholding powers
  powers = c(c(1:10), seq(from = 12, to=20, by=2))
  # Call the network topology analysis function
  allowWGCNAThreads() #needed otherwise would not work! https://www.biostars.org/p/122349/
  sft = pickSoftThreshold(omics_data, powerVector = powers, verbose = 5)
  # Plot the results:
  # Find the soft thresholding power beta to which co-expression similarity is raised to calculate adjacency.
  # based on the criterion of approximate scale-free topology.
  idx <- min(which((-sign(sft$fitIndices[,3])*sft$fitIndices[,2]) > 0.90))
  if(is.infinite(idx)){
    idx <- min(which((-sign(sft$fitIndices[,3])*sft$fitIndices[,2]) > 0.80))
    if(!is.infinite(idx)){
      st <- sft$fitIndices[idx,1]
    } else{
      idx <- which.max(-sign(sft$fitIndices[,3])*sft$fitIndices[,2])
      st <- sft$fitIndices[idx,1]
    }
  } else{st <- sft$fitIndices[idx,1]}
  # Plot Scale independence measure and Mean connectivity measure

  # Scale-free topology fit index as a function of the soft-thresholding power
  data.frame(Indices = sft$fitIndices[,1],
           sfApprox = -sign(sft$fitIndices[,3])*sft$fitIndices[,2]) %>% 
  ggplot() + 
    geom_hline(yintercept = 0.9, color = "red", alpha = 0.6) + # corresponds to R^2 cut-off of 0.9
    geom_hline(yintercept = 0.8, color = "red", alpha = 0.2) + # corresponds to R^2 cut-off of 0.8
    geom_line(aes(x = Indices, y = sfApprox), color = "red", alpha = 0.1, size = 2.5) +
    geom_text(mapping = aes(x = Indices, y = sfApprox, label = Indices), color = "red", size = 4) +
    ggtitle("Scale independence") +
    xlab("Soft Threshold (power)") +
    ylab("SF Model Fit,signed R^2") +
    xlim(1,20) +
    ylim(-1,1) +
    geom_segment(aes(x = st, y = 0.25, xend = st, yend = sfApprox[idx]-0.05), 
               arrow = arrow(length = unit(0.2,"cm")), 
               size = 0.5)-> scale_independence_plot 
  
  
  WGCNA_list[[Datasetname]][["plots"]][["scale_independence_plot"]] <-scale_independence_plot 
  
  # Mean connectivity as a function of the soft-thresholding power

  data.frame(Indices = sft$fitIndices[,1],
           meanApprox = sft$fitIndices[,5]) %>% 
    ggplot() + 
    geom_line(aes(x = Indices, y = meanApprox), color = "red", alpha = 0.1, size = 2.5) +
    geom_text(mapping = aes(x = Indices, y = meanApprox, label = Indices), color = "red", size = 4) +
    xlab("Soft Threshold (power)") +
    ylab("Mean Connectivity") +
    geom_segment(aes(x = st-0.4, 
                   y = sft$fitIndices$mean.k.[idx], 
                   xend = 0, 
                   yend = sft$fitIndices$mean.k.[idx]),
               arrow = arrow(length = unit(0.2,"cm")), 
               size = 0.4) +
    ggtitle(paste0("Mean connectivity: ", 
                 round(sft$fitIndices$mean.k.[idx],2))) -> mean_connectivity_plot

  cowplot::plot_grid(scale_independence_plot, mean_connectivity_plot, ncol = 2, align = "h", 
                     labels = c("A", "B")) -> si_mc_plot
  WGCNA_list[[Datasetname]][["plots"]][["si_mc_plot"]] <- si_mc_plot

  ggsave(si_mc_plot, filename = paste(paste(save_name, "soft_thresholding_power" , Datasetname,
        sep="_"),".png", sep=""), path = pathPlots, device='png', dpi=300, width = 8, height = 6)
  si_mc_plot

  softPower = st; print(paste("SoftPower chosen to", softPower))
  
  WGCNA_list[[Datasetname]][["softPower"]] <- softPower
}
## Allowing multi-threading with up to 8 threads.
## pickSoftThreshold: will use block size 1552.
##  pickSoftThreshold: calculating connectivity for given powers...
##    ..working on genes 1 through 1552 of 1552
##    Power SFT.R.sq  slope truncated.R.sq mean.k. median.k. max.k.
## 1      1    0.194 -1.080          0.586 231.000  2.21e+02  451.0
## 2      2    0.739 -1.530          0.742  59.600  4.98e+01  206.0
## 3      3    0.932 -1.570          0.915  22.200  1.44e+01  125.0
## 4      4    0.917 -1.420          0.914  11.100  4.94e+00   90.1
## 5      5    0.909 -1.280          0.908   6.750  1.90e+00   72.0
## 6      6    0.907 -1.160          0.903   4.730  8.20e-01   60.8
## 7      7    0.927 -1.080          0.926   3.600  3.70e-01   52.9
## 8      8    0.949 -1.030          0.941   2.900  1.83e-01   47.1
## 9      9    0.945 -1.000          0.936   2.410  8.96e-02   42.5
## 10    10    0.931 -0.979          0.915   2.060  4.43e-02   38.7
## 11    12    0.913 -0.957          0.893   1.580  1.19e-02   32.9
## 12    14    0.938 -0.941          0.921   1.260  3.53e-03   28.6
## 13    16    0.961 -0.916          0.950   1.040  1.06e-03   25.2
## 14    18    0.950 -0.914          0.936   0.877  3.17e-04   22.5
## 15    20    0.922 -0.922          0.900   0.750  1.09e-04   20.3
## [1] "SoftPower chosen to 3"
## Allowing multi-threading with up to 8 threads.
## pickSoftThreshold: will use block size 1344.
##  pickSoftThreshold: calculating connectivity for given powers...
##    ..working on genes 1 through 1344 of 1344
##    Power SFT.R.sq  slope truncated.R.sq mean.k. median.k. max.k.
## 1      1   0.0484  1.040          0.955 193.000  1.92e+02 277.00
## 2      2   0.1070 -0.667          0.930  45.900  4.40e+01  96.60
## 3      3   0.7140 -1.630          0.892  15.100  1.33e+01  53.20
## 4      4   0.9340 -1.860          0.958   6.530  4.71e+00  37.10
## 5      5   0.9530 -1.730          0.946   3.490  1.91e+00  29.20
## 6      6   0.9710 -1.550          0.966   2.190  8.44e-01  24.40
## 7      7   0.9840 -1.410          0.983   1.550  3.96e-01  21.10
## 8      8   0.9820 -1.310          0.982   1.180  1.97e-01  18.60
## 9      9   0.9770 -1.240          0.975   0.944  1.02e-01  16.70
## 10    10   0.9420 -1.200          0.928   0.786  5.48e-02  15.00
## 11    12   0.9680 -1.100          0.963   0.585  1.70e-02  12.40
## 12    14   0.9280 -1.040          0.912   0.463  5.66e-03  10.50
## 13    16   0.9550 -0.974          0.942   0.381  1.96e-03   8.89
## 14    18   0.9370 -0.936          0.921   0.322  7.01e-04   7.65
## 15    20   0.9000 -0.921          0.872   0.277  2.58e-04   6.93
## [1] "SoftPower chosen to 4"
## Allowing multi-threading with up to 8 threads.
## pickSoftThreshold: will use block size 2364.
##  pickSoftThreshold: calculating connectivity for given powers...
##    ..working on genes 1 through 2364 of 2364
##    Power SFT.R.sq  slope truncated.R.sq mean.k. median.k. max.k.
## 1      1   0.1460  1.180          0.959  679.00   667.000 1020.0
## 2      2   0.0603 -0.389          0.918  283.00   266.000  573.0
## 3      3   0.4100 -0.917          0.942  143.00   125.000  365.0
## 4      4   0.6520 -1.180          0.954   80.90    66.500  252.0
## 5      5   0.7580 -1.290          0.957   50.20    39.100  183.0
## 6      6   0.8220 -1.340          0.952   33.30    24.200  139.0
## 7      7   0.8580 -1.350          0.938   23.30    15.700  109.0
## 8      8   0.9270 -1.320          0.962   17.10    10.400   87.5
## 9      9   0.9580 -1.310          0.981   12.90     7.090   73.5
## 10    10   0.9530 -1.340          0.970   10.10     4.920   64.7
## 11    12   0.9550 -1.390          0.970    6.64     2.570   53.3
## 12    14   0.9640 -1.420          0.988    4.70     1.450   45.7
## 13    16   0.9590 -1.400          0.980    3.52     0.874   40.3
## 14    18   0.9530 -1.390          0.979    2.76     0.548   36.1
## 15    20   0.9500 -1.370          0.974    2.24     0.352   32.9
## [1] "SoftPower chosen to 8"
9.2.2.2 blockwiseModules

deepSplit integer value between 0 and 4. Provides a simplified control over how sensitive module detection should be to module splitting, with 0 least and 4 most sensitive. See cutreeDynamic for more details.

https://support.bioconductor.org/p/104602/ mergeCutHeight I am not aware of a principle from which one could derive an “appropriate” value, but in practice, on data sets with 50-100 samples, using 0.15 to 0.2 has worked fairly well. For fewer samples a larger value (0.25 to 0.3) may be warranted. If you want larger modules, increase the value; if you want smaller modules at the risk of having redundant modules (modules with very similar functional annotation and very similar fuzzy module membership), you can decrease the value to say 0.10, maybe even lower if you have lots of samples (hundreds or more). https://support.bioconductor.org/p/101579/ reassignThreshold First things first: grey is not really a module, it is a label for unassigned genes, and the eigengene and kME for the grey “module” are more or less meaningless. In other words, ignore the eigengene and kME values for the grey “module”.

WGCNA assigns module labels using dynamic tree cut (look up dynmaicTreeCut) of hierarchical clustering tree that is based on the Toplogical Overlap Measure (TOM). TOM results in similar but not quite the same similarity as correlation, hence for some genes the assigned module may differ from the module with highest kME. Module merging can also play a part here.

Practically speaking, genes will have a high kME to their assigned module. When assigned module and module of highest kME differ, the gene probably has high kME to both and can be considered intermediate between the two modules.

I don’t really recommend this, but if you absolutely want all genes to be assigned to the module of highest kME, try using argument reassignThreshold=1 to blockwiseModules. This will re-assign all genes to the module of their highest kME after the initial modules have been identified. Note though that the reassignment is not iterated with module eigengene re-calculation.

In all, I don’t worry about the module assignment vs. max. kME differences in my own analyses, and I recommend not worrying it about it to others as well.

https://cran.r-project.org/web/packages/WGCNA/WGCNA.pdf maxPOutliers maxPOutliers specifies the maximum percentile of data that can be considered outliers on either side of the median separately. For each side of the median, if higher percentile than maxPOutliers is considered an outlier by the weight function based on 9*mad(x), the width of the weight function is increased such that the percentile of outliers on that side of the median equals maxPOutliers. Using maxPOutliers=1 will effectively disable all weight function broadening; using maxPOutliers=0 will give results that are quite similar (but not equal to) Pearson correlation.

minCoreKMESize a number between 0 and 1. If a detected module does not have at least minModuleKMESize genes with eigengene connectivity at least minCoreKME, the module is disbanded (its genes are unlabeled and returned to the pool of genes waiting for mofule detection).

minKMEtoStay
genes whose eigengene connectivity to their module eigengene is lower than minKMEtoStay are removed from the module.

for (Datasetname  in names(WGCNA_list)) {
  
  library(WGCNA)
  # The following setting is important, do not omit.
  options(stringsAsFactors = FALSE)
  
  omics_data <- WGCNA_list[[Datasetname]]$omics_data
  datTraits  <- WGCNA_list[[Datasetname]]$datTraits
  softPower  <- WGCNA_list[[Datasetname]]$softPower
  
  ##################
  #blockwiseModules#
  ##################
  print(Datasetname)
  print(paste("SoftPower chosen to", softPower))
  
  if (Datasetname %in% c("OE", "GC")) {
  maxPOutliers_value    <- 0.05
  mergeCutHeight_value  <- 0.15
  deepSplit_value       <- 3
  } else if (Datasetname  %in% c("WF")) {
  maxPOutliers_value    <- 0.1
  mergeCutHeight_value  <- 0.3
  deepSplit_value       <- 2
  }
  
  network = blockwiseModules(omics_data, 
                       power = st,
                       networkType = "signed", 
                       TOMType = "signed",
                       corType = "bicor",
                       maxPOutliers = maxPOutliers_value,
                       minModuleSize = 3, 
                       #reassignThreshold = 0, 
                       minCoreKME = 0.5,      # Default 0.5
                       #minCoreKMESize = 3,   # Default minModuleSize/3,
                       minKMEtoStay = 0.5,    # Default 0.3 High intramodular correlation
                       deepSplit = deepSplit_value,
                       mergeCutHeight = mergeCutHeight_value,
                       numericLabels = TRUE, 
                       pamRespectsDendro = FALSE,
                       saveTOMs = TRUE,
                       saveTOMFileBase = paste(Species,Tissue,Type,Season,Datasetname, "TOM", sep="_"), 
                       verbose = 3)
  
  
  

#Strand et., 2021
# network <- blockwiseModules(omics_data,
#                           power = st,
#                           networkType = "signed",
#                           TOMType = "signed",
#                           corType = "bicor",
#                           maxPOutliers = 0.05,
#                           deepSplit = 4, # Default 2
#                           minModuleSize = 2, # 30
#                           minCoreKME = 0.5,      # Default 0.5
#                           minCoreKMESize = 2,    # Default minModuleSize/3,
#                           minKMEtoStay = 0.5,    # Default 0.3
#                           reassignThreshold = 0, # Default 1e-6
#                           mergeCutHeight = 0.4,  # Default 0.15
#                           pamStage = FALSE,
#                           pamRespectsDendro = TRUE,
#                           replaceMissingAdjacencies = TRUE,
#                           numericLabels = TRUE,
#                           saveTOMs = TRUE,
#                           saveTOMFileBase = paste(Species,Tissue,Type,Season, "TOM", sep="_"),
#                           verbose = 3)


  moduleLabels <- network$colors
  moduleColors <- labels2colors(network$colors)
  MEs <- network$MEs
  geneTree <- network$dendrograms[[1]]
  
  WGCNA_list[[Datasetname]][["MEs"]] <- MEs
  WGCNA_list[[Datasetname]][["moduleLabels"]] <- moduleLabels
  WGCNA_list[[Datasetname]][["moduleColors"]] <- moduleColors
  WGCNA_list[[Datasetname]][["geneTree"]] <- geneTree
  WGCNA_list[[Datasetname]][["network"]] <- network #we also save the whole network as its small from 16S

}
## [1] "OE"
## [1] "SoftPower chosen to 3"
##  Calculating module eigengenes block-wise from all genes
##    Flagging genes and samples with too many missing values...
##     ..step 1
##  ..Working on block 1 .
##     TOM calculation: adjacency..
##     ..will use 8 parallel threads.
##      Fraction of slow calculations: 0.000000
##     ..connectivity..
##     ..matrix multiplication (system BLAS)..
##     ..normalization..
##     ..done.
##    ..saving TOM for block 1 into file OE_GC_Gill_16S_SU_AU_WI_SP_OE_TOM-block.1.RData
##  ....clustering..
##  ....detecting modules..
##  ....calculating module eigengenes..
##  ....checking kME in modules..
##      ..removing 239 genes from module 1 because their KME is too low.
##      ..removing 168 genes from module 2 because their KME is too low.
##      ..removing 147 genes from module 3 because their KME is too low.
##      ..removing 38 genes from module 4 because their KME is too low.
##      ..removing 29 genes from module 5 because their KME is too low.
##      ..removing 42 genes from module 6 because their KME is too low.
##      ..removing 28 genes from module 7 because their KME is too low.
##      ..removing 27 genes from module 8 because their KME is too low.
##      ..removing 5 genes from module 9 because their KME is too low.
##      ..removing 3 genes from module 10 because their KME is too low.
##      ..removing 5 genes from module 11 because their KME is too low.
##   ..reassigning 7 genes from module 2 to modules with higher KME.
##   ..reassigning 1 genes from module 4 to modules with higher KME.
##   ..reassigning 1 genes from module 9 to modules with higher KME.
##  ..merging modules that are too close..
##      mergeCloseModules: Merging modules whose distance is less than 0.15
##        Calculating new MEs...
## [1] "GC"
## [1] "SoftPower chosen to 4"
##  Calculating module eigengenes block-wise from all genes
##    Flagging genes and samples with too many missing values...
##     ..step 1
##  ..Working on block 1 .
##     TOM calculation: adjacency..
##     ..will use 8 parallel threads.
##      Fraction of slow calculations: 0.000000
##     ..connectivity..
##     ..matrix multiplication (system BLAS)..
##     ..normalization..
##     ..done.
##    ..saving TOM for block 1 into file OE_GC_Gill_16S_SU_AU_WI_SP_GC_TOM-block.1.RData
##  ....clustering..
##  ....detecting modules..
##  ....calculating module eigengenes..
##  ....checking kME in modules..
##      ..removing 123 genes from module 1 because their KME is too low.
##      ..removing 175 genes from module 2 because their KME is too low.
##      ..removing 60 genes from module 3 because their KME is too low.
##      ..removing 58 genes from module 4 because their KME is too low.
##      ..removing 58 genes from module 5 because their KME is too low.
##      ..removing 14 genes from module 6 because their KME is too low.
##      ..removing 6 genes from module 7 because their KME is too low.
##      ..removing 11 genes from module 8 because their KME is too low.
##      ..removing 10 genes from module 9 because their KME is too low.
##      ..removing 6 genes from module 10 because their KME is too low.
##      ..removing 3 genes from module 11 because their KME is too low.
##      ..removing 1 genes from module 13 because their KME is too low.
##      ..removing 6 genes from module 14 because their KME is too low.
##      ..removing 2 genes from module 15 because their KME is too low.
##      ..removing 1 genes from module 16 because their KME is too low.
##   ..reassigning 3 genes from module 1 to modules with higher KME.
##  ..merging modules that are too close..
##      mergeCloseModules: Merging modules whose distance is less than 0.15
##        Calculating new MEs...
## [1] "WF"
## [1] "SoftPower chosen to 8"
##  Calculating module eigengenes block-wise from all genes
##    Flagging genes and samples with too many missing values...
##     ..step 1
##  ..Working on block 1 .
##     TOM calculation: adjacency..
##     ..will use 8 parallel threads.
##      Fraction of slow calculations: 0.000000
##     ..connectivity..
##     ..matrix multiplication (system BLAS)..
##     ..normalization..
##     ..done.
##    ..saving TOM for block 1 into file OE_GC_Gill_16S_SU_AU_WI_SP_WF_TOM-block.1.RData
##  ....clustering..
##  ....detecting modules..
##  ....calculating module eigengenes..
##  ....checking kME in modules..
##      ..removing 49 genes from module 1 because their KME is too low.
##      ..removing 46 genes from module 2 because their KME is too low.
##      ..removing 10 genes from module 3 because their KME is too low.
##      ..removing 8 genes from module 4 because their KME is too low.
##      ..removing 30 genes from module 5 because their KME is too low.
##      ..removing 14 genes from module 6 because their KME is too low.
##      ..removing 8 genes from module 7 because their KME is too low.
##      ..removing 11 genes from module 8 because their KME is too low.
##      ..removing 5 genes from module 9 because their KME is too low.
##      ..removing 6 genes from module 10 because their KME is too low.
##      ..removing 17 genes from module 11 because their KME is too low.
##      ..removing 1 genes from module 12 because their KME is too low.
##      ..removing 1 genes from module 13 because their KME is too low.
##      ..removing 4 genes from module 14 because their KME is too low.
##      ..removing 1 genes from module 15 because their KME is too low.
##      ..removing 6 genes from module 16 because their KME is too low.
##      ..removing 5 genes from module 17 because their KME is too low.
##      ..removing 7 genes from module 18 because their KME is too low.
##      ..removing 10 genes from module 20 because their KME is too low.
##      ..removing 4 genes from module 21 because their KME is too low.
##      ..removing 9 genes from module 22 because their KME is too low.
##      ..removing 6 genes from module 23 because their KME is too low.
##      ..removing 9 genes from module 24 because their KME is too low.
##      ..removing 1 genes from module 25 because their KME is too low.
##      ..removing 4 genes from module 26 because their KME is too low.
##      ..removing 4 genes from module 27 because their KME is too low.
##      ..removing 1 genes from module 28 because their KME is too low.
##      ..removing 1 genes from module 30 because their KME is too low.
##      ..removing 1 genes from module 31 because their KME is too low.
##      ..removing 10 genes from module 32 because their KME is too low.
##      ..removing 2 genes from module 33 because their KME is too low.
##      ..removing 2 genes from module 35 because their KME is too low.
##      ..removing 1 genes from module 36 because their KME is too low.
##      ..removing 2 genes from module 38 because their KME is too low.
##  ..merging modules that are too close..
##      mergeCloseModules: Merging modules whose distance is less than 0.3
##        Calculating new MEs...
9.2.2.3 Module & Network Visualization
for (Datasetname  in names(WGCNA_list)) {
  
  library(WGCNA)
  # The following setting is important, do not omit.
  options(stringsAsFactors = FALSE)
  
  omics_data    <- WGCNA_list[[Datasetname]]$omics_data
  datTraits     <- WGCNA_list[[Datasetname]]$datTraits
  moduleLabels  <- WGCNA_list[[Datasetname]]$moduleLabels
  moduleColors  <- WGCNA_list[[Datasetname]]$moduleColors
  MEs           <- WGCNA_list[[Datasetname]]$MEs
  network       <- WGCNA_list[[Datasetname]]$network
  ps            <- WGCNA_list[[Datasetname]]$ps
  
  #####################
  #plotDendroAndColors#
  #####################

  ##############################
  #Number of ASV in each module#
  ##############################
  as.data.frame(table(moduleLabels)) %>% 
  dplyr::rename(Module = moduleLabels, Size = Freq) %>%
  dplyr::mutate(Module_color = labels2colors(as.numeric(as.character(Module)))) -> module_size
  
  module_size %>% 
    ggplot(aes(x = Module, y = Size, fill = Module)) +
    geom_col(color =  "#000000") +
    ggtitle("Number of genes in each module") +
    theme(legend.position = "none") + 
    scale_fill_manual(values = setNames(module_size$Module_color,module_size$Module)) +
    geom_text(aes(label = Size),vjust = 0.5, hjust = -0.18, size = 3.5) +
    ylim(0, max(module_size$Size)*1.1) +
    theme(plot.margin = margin(2, 2, 2, 2, "pt")) +
    coord_flip() -> ASVModuleSizePlot
  ASVModuleSizePlot
  
  ggsave(ASVModuleSizePlot, filename = paste(paste(save_name, "ASVModulePlot" , Datasetname, sep="_"),
          ".png", sep=""), path = pathPlots, device='png', dpi=300, width = 8, height = 6)

  WGCNA_list[[Datasetname]][["plots"]][["ASVModuleSizePlot"]] <-ASVModuleSizePlot
  
  
  #######################################
  #Module-Eigenegene-Correlation-Heatmap#
  #######################################
  MEs_R <- bicor(MEs, MEs, maxPOutliers = 0.05)
  idx.r <- which(rownames(MEs_R) == "ME0")
  idx.c <- which(colnames(MEs_R) == "ME0")
  MEs_R_noME0 <- MEs_R[-idx.r, -idx.c]

  MEs_R_noME0[upper.tri(MEs_R_noME0)] %>% 
    as.data.frame() %>% 
    dplyr::rename("correlation" = ".") %>% 
    ggplot(aes(x=correlation)) + 
    geom_histogram(bins = 20) +
    #geom_density() + 
    xlim(-1, 1) +
    ggtitle(paste0(prefix,"ME correlation\n w/o ",prefix ,"ME0")) -> MEs_R_density

  WGCNA_list[[Datasetname]][["plots"]][["MEs_R_density"]] <-MEs_R_density
  

  pheatmap::pheatmap(MEs_R_noME0, color = colorRampPalette(c("Blue", "White", "Red"))(100),
                   silent = T, 
                   breaks = seq(-1,1,length.out = 101),
                   treeheight_row = 5, 
                   treeheight_col = 5,
                   main = paste0(prefix,"ME correlation heatmap w/o ",prefix ,"ME0"),
                   labels_row = paste0(prefix, rownames(MEs_R)),
                   labels_col = paste0(prefix, colnames(MEs_R))) -> MEs_R_Corr

  ggsave(MEs_R_Corr, filename = paste(paste(save_name, "MEs_R_Corr" , Datasetname, sep="_"),".png", sep=""),
         path = pathPlots, device='png', dpi=300, width = 8, height = 6)
  WGCNA_list[[Datasetname]][["plots"]][["MEs_R_Corr"]] <-MEs_R_Corr

  MEs_R_Corr
  
  cowplot::plot_grid(MEs_R_density, MEs_R_Corr$gtable, labels = c("D", "E"), rel_widths = c(0.6, 1)) ->
    density_eigen
  density_eigen


  #######################
  #Network-Visualization#
  #######################
  # 5 Visualization of networks within R
  # 5.a Visualizing the gene network
  # One way to visualize a weighted network is to plot its heatmap, Fig. 1. Each row and column of the heatmap
  # correspond to a single gene. The heatmap can depict adjacencies or topological overlaps, with light colors   denoting
  # low adjacency (overlap) and darker colors higher adjacency (overlap). In addition, the gene dendrograms   and module
  # colors are plotted along the top and left side of the heatmap. The package provides a convenient function to create
  # such network plots; Fig. 1 was created using the following code. This code can be executed only if the network
  # was calculated using a single-block approach (that is, using the 1-step automatic or the step-by-step tutorials). If
  # the networks were calculated using the block-wise approach, the user will need to modify this code to perform the
  # visualization in each block separately. The modi cation is simple and we leave it as an exercise for the interested
  # reader.
  # Calculate topological overlap anew: this could be done more efficiently by saving the TOM
  # calculated during module detection, but let us do it again here.

  # dissTOM = 1-TOMsimilarityFromExpr(omics_data, power = softPower);
  # # Transform dissTOM with a power to make moderately strong connections more visible in the heatmap
  # plotTOM = dissTOM^10;
  # # Set diagonal to NA for a nicer plot
  # diag(plotTOM) = NA;
  # # Call the plot function
  # TOMplot(plotTOM, geneTree, moduleColors, main = "Network heatmap plot, all genes")

  #############################
  # Correlation within modules# from Strand et al, 2021
  #############################
  corr_within_module <- function(omics_data, network, module_x = 1){
  idx.omics_data <- which(network$colors == module_x)
  idx.me <- which(colnames(network$MEs) == paste0("ME",module_x))
  kME_x <- bicor(omics_data[,idx.omics_data], network$MEs[,idx.me], maxPOutliers = 0.05)
  kME_x}
  ggplot.list <- list()
  for(m in colnames(network$MEs)){
    h <- as.numeric(sub("ME","", m))
    data.frame(x = suppressWarnings(corr_within_module(omics_data = omics_data, network = network, 
                                                       module_x = h))) %>% 
    ggplot() + 
    geom_histogram(aes(x), fill = labels2colors(h), color = "black", alpha = 0.5, bins = 20) + 
    xlim(-1, 1) +
    xlab("ASV correlation")+
    ggtitle(paste0(prefix,m)) -> da_plot
  ggplot.list[[m]] <- da_plot}
  ggplot.list <- ggplot.list[ggplot.list %>% names() %>% sub("ME", "", .) %>% as.numeric() %>% order()]

  cowplot::plot_grid(plotlist = ggplot.list, ncol = 3) -> density_all_plot
  density_all_plot

  ggsave(density_all_plot, filename = paste(paste(save_name, "WithinModuleCorrelation" ,Datasetname, sep="_"),".png", sep=""), path = pathPlots, device='png', dpi=300, width = 8, height = 12)

WGCNA_list[[Datasetname]][["plots"]][["density_all_plot"]] <-density_all_plot

}

##############
#Summary Plot#
##############
for (Datasetname  in names(WGCNA_list)) {
  
  plots <- WGCNA_list[[Datasetname]][["plots"]]

  cowplot::plot_grid(plots$si_mc_plot, plots$ASVModuleSizePlot, labels = c("","C"), ncol = 2) -> part_1

  cowplot::plot_grid(part_1, plots$density_eigen, labels = c("", ""), ncol = 1, rel_widths = c(1,0.5)) -> part_2

  cowplot::plot_grid(part_2, plots$density_all_plot, labels = c("", "F"), rel_widths = c(1,0.1), ncol = 1) -> part_3

  ggsave(part_2, filename = paste(paste(save_name, "SSU_Network", Datasetname, Date, sep="_"),".png", sep=""),
         path = pathPlots , device='png', dpi=300, width = 12, height = 10)
  
  plot(part_2)
  
}

9.2.4 Module-Trait-Correlation Heatmap

for (Datasetname  in names(WGCNA_list)) {

  omics_data    <- WGCNA_list[[Datasetname]]$omics_data
  datTraits     <- WGCNA_list[[Datasetname]]$datTraits
  moduleLabels  <- WGCNA_list[[Datasetname]]$moduleLabels
  moduleColors  <- WGCNA_list[[Datasetname]]$moduleColors
  MEs           <- WGCNA_list[[Datasetname]]$MEs
  network       <- WGCNA_list[[Datasetname]]$network
  traitData       <- WGCNA_list[[Datasetname]]$traitData
  
  ###################
  #Self made Heatmap#
  ###################
  #https://bioinformaticsworkbook.org/tutorials/wgcna.html#gsc.tab=0
  # Define numbers of genes and samples
  MEs = orderMEs(MEs)
  names(MEs) = names(MEs) %>% gsub("ME","", .)  %>% paste("SSU",., sep="")
  moduleTraitCor = cor(MEs, datTraits, use = "p");
  moduleTraitPvalue = corPvalueStudent(moduleTraitCor, nSamples);
  textMatrix <- paste(signif(moduleTraitCor, 2), "\n(", signif(moduleTraitPvalue, 1), ")", sep = "")
  dim(textMatrix) <- dim(moduleTraitCor)

  # Add treatment names
  MEs$treatment = row.names(MEs)
  mat <- as.data.frame(t(moduleTraitCor))
  mat$traits <- rownames(mat)
  # tidy & plot data
  module_order = names(MEs) 

  mME = mat %>%
  pivot_longer(-traits) %>%
  mutate(#name = gsub("ME", "", name),
    name = factor(name, levels = module_order))

  textMatrixLong <-  as.data.frame(t(signif(moduleTraitCor, 2)))
  textMatrixLong$traits <- rownames(textMatrixLong)
  textMatrixLong = textMatrixLong %>%
    pivot_longer(-traits) %>%
    mutate(
    #name = gsub("ME", "", name),
    name = factor(name, levels = module_order))
  textMatrixLong <- as.data.frame(textMatrixLong)

  textMatrixLong2 <-  as.data.frame(t(signif(moduleTraitPvalue, 1)))
  textMatrixLong2$traits <- rownames(textMatrixLong2)
  textMatrixLong2 = textMatrixLong2%>%
    pivot_longer(-traits) %>%
    mutate(
    name = gsub("ME", "", name),
    name = factor(name, levels = module_order))
  textMatrixLong2 <- as.data.frame(textMatrixLong2)

  ## add gene counts per module
  genesmod<- as.data.frame(moduleLabels)
  genesmod$genes <- rownames(genesmod)
  genesmod$Modules <- paste("SSU",genesmod$moduleLabels, sep="") #labels2colors(genesmod$moduleLabels)
 
  ModCount <- as.data.frame(genesmod %>% 
    dplyr::group_by(Modules) %>% 
    dplyr::summarise(n = n()) )
  ModCount <- ModCount[order(as.numeric(ModCount$n), decreasing = T),]


  HM <- mME %>% ggplot(., aes(x=traits, y=name, fill=value)) +
    geom_tile() +
    scale_fill_gradient2(
      low = "steelblue1",
      high = "red",
      mid = "white",
      midpoint = 0,
      limit = c(-0.7,0.7)) +
  theme(axis.text.x = element_text(angle=90)) +
  labs(title = paste("Module-trait Relationships", Datasetname, sep = " "), y = "Modules", fill="corr") +
  geom_text(aes(label=textMatrixLong$value), position=position_nudge(y=0.2), 
                      size=2.5, colour="grey20") +
  geom_text(aes(label=paste0("(",textMatrixLong2$value,")")), position=position_nudge(y=-0.2), 
                      colour="grey20", size=2.5) +
   theme_minimal() + atheme + theme(axis.title.x = element_blank()) +
      theme( panel.grid.major = element_blank(),panel.grid.minor = element_blank())
      prow <- cowplot::plot_grid(HM, labels = c(""), ncol = 1)
      
  ModuleHeatmap<- plot_grid(prow, ncol = 1, rel_heights = c(0.02, 0.05, 0.98))
      
  ggsave(ModuleHeatmap, filename = paste(paste(save_name, "ModuleHeatmap", Datasetname, Date, sep="_"), 
          ".png", sep=""), path = pathPlots, device='png', dpi=300, width = 8, height = 6)

  WGCNA_list[[Datasetname]][["plots"]][["ModuleHeatmap"]] <- ModuleHeatmap

  
  
 
  mME<- mME%>% mutate(Category = case_when((traits %in% c(
  "NH4", "NO2", "NO3", "O2", "PO4", "TOC", "Temp",  "SPM", "Salinity"
  )) ~ "Abiotics", 
  (traits %in% c(
  "HSI", "SSI", "GSI", "FCF", "Age", "Length", "FillLevel")) ~ "Physiology"))
  
  Order<- c("Abiotics", "Physiology")
 
  HM <- mME %>% ggplot(., aes(x=name, y=factor(traits, levels=traitData), fill=value)) +
  geom_tile() +
    scale_fill_gradientn(
      colours = c("steelblue1", "white", "red"),
      limit = c(-1,1), 
      values = scales::rescale(c(-1, -0.3, 0, 0.3, 1))) +
  
  scale_x_discrete(limits=ModCount$Modules, labels=paste(ModCount$Modules, ": ", ModCount$n)) +
  facet_grid(factor(mME$Category, levels=Order), scales = "free", space = "free") +
  theme(axis.text.x = element_text(angle=90)) +
  labs(title = paste("Module-trait Relationships", Datasetname, sep = " "), y = "Modules", fill="corr") +
  geom_text(aes(label=textMatrixLong$value), position=position_nudge(y=0.2), 
                      size=2.5, colour="grey20") +
  geom_text(aes(label=paste0("(",textMatrixLong2$value,")")), position=position_nudge(y=-0.2), 
                      colour="grey20", size=2.5) +
   theme_minimal() + atheme + theme(axis.title.x = element_blank()) +
      theme( panel.grid.major = element_blank(),panel.grid.minor = element_blank())
  
  ModuleHeatmap2 <- cowplot::plot_grid(HM, labels = c(""), ncol = 1)
      
  ModuleHeatmap2 <- plot_grid(ModuleHeatmap2, ncol = 1, rel_heights = c(0.02, 0.05, 0.98))
      plot(ModuleHeatmap2)
      
  ggsave(ModuleHeatmap2, filename = paste(paste(save_name, "ModuleHeatmap-2", Datasetname, Date, sep="_"),".png",
          sep=""), path = pathPlots, device='png', dpi=300, width = 9, height = 6) 
    
  WGCNA_list[[Datasetname]][["plots"]][["ModuleHeatmap2"]] <- ModuleHeatmap2
  
  ##################
  #Specific Modules#
  ##################
 #  Module <- "darkturquoise"
 #  HM <- mME[mME$name == Module,] %>% ggplot(., aes(x=name, y=factor(traits, levels=traitData), fill=value)) +
 #  geom_tile() +
 #    scale_fill_gradientn(
 #      colours = c("steelblue1", "white", "red"),
 #      limit = c(-1,1), 
 #      values = scales::rescale(c(-1, -0.3, 0, 0.3, 1))) +
 #  # scale_fill_gradient2(
 #  #   low = "steelblue1",
 #  #   high = "red",
 #  #   mid = "white",
 #  #   midpoint = 0,
 #  #   limit = c(-1,1)) 
 #  #facet_grid(factor(mME$Category, levels=Order), scales = "free", space = "free") +
 #  theme(axis.text.x = element_text(angle=90)) +
 #  labs(title = "", y = "Modules", fill="corr") +
 #  geom_text(aes(label=textMatrixLong[textMatrixLong$name == Module,]$value), position=position_nudge(y=0.2), 
 #                      size=3, colour="grey20") +
 #  geom_text(aes(label=paste0("(",textMatrixLong2[textMatrixLong2$name == Module,]$value,")")), position=position_nudge(y=-0.2), 
 #                      colour="grey20", size=3) +
 #   theme_minimal() + atheme + theme(axis.title.x = element_blank()) +
 #      theme( panel.grid.major = element_blank(),panel.grid.minor = element_blank())
 #      prow <- cowplot::plot_grid(HM, labels = c(""), ncol = 1)
 # A<- plot_grid(prow, ncol = 1, rel_heights = c(0.02, 0.05, 0.98))
 #      plot(A)
 #      ggsave(A, filename = paste("WGCNA-ModuleHeatmap", Module, sep="_"), path = pathPlots, 
 #             device='png', dpi=300, width = 2.8,height = 7)

}

9.2.6 Taxa Cluster-Heatmap

for (Datasetname  in names(WGCNA_list)) {
  
  omics_data    <- WGCNA_list[[Datasetname]]$omics_data
  datTraits     <- WGCNA_list[[Datasetname]]$datTraits
  moduleLabels  <- WGCNA_list[[Datasetname]]$moduleLabels
  moduleColors  <- WGCNA_list[[Datasetname]]$moduleColors
  MEs           <- WGCNA_list[[Datasetname]]$MEs
  network       <- WGCNA_list[[Datasetname]]$network
  ps            <- WGCNA_list[[Datasetname]]$ps
  SSUWGCNAlist  <- WGCNA_list[[Datasetname]]$SSUWGCNAlist

    for (MODULE in names(SSUWGCNAlist)) {
      tryCatch({
        
      genes_of_interest <- names(omics_data[moduleLabels == paste(sub("SSU", "", MODULE))])
      print(MODULE)
      #print(head(names(ExpressionSet[moduleLabels == paste(sub("ME", "", MODULE))])))  
      
      FILENAME    <- paste(paste(save_name, "Heatplot", MODULE, Datasetname, sep="_"),".png", sep="")
      
      TITLE       <- draw_label_themeRKwhite(paste(Datasetname, MODULE),
                                    element = "plot.title", x =  0.05, hjust = 0, vjust = 1)
      #For any unknown reason gene names like trnat-ugu_39 get changes by WGCNA to trnat.ugu_39

      rowclusternum  <- 1
      columnclusternum    <- 1

      require(DESeq2)
      require(tidyverse)
      require(ggplot2)

      BacteriaHeatPlotRKnoClust(clr = omics_data, Species = Datasetname, min_count = 10,
                                        genes_of_interest = genes_of_interest, Samples = sample_names(ps), 
                                        SAMDF = sample_data(ps),  TITLE = TITLE, filename= FILENAME,
                                        OutlineColor = "grey20")

    if (MODULE == "ME5") {
      plot(A)
    } #else {
    #print("Plots are saved to the pathPlots")
    #}
      
      }, error=function(e){cat("ERROR :",conditionMessage(e), "\n")})
    }
}

#-

9.3 Visualization

9.3.1 Modules of Interest

ModsOfInterst_list <- list()
for (Datasetname  in names(WGCNA_list)) {
  
  omics_data    <- WGCNA_list[[Datasetname]]$omics_data
  datTraits     <- WGCNA_list[[Datasetname]]$datTraits
  moduleLabels  <- WGCNA_list[[Datasetname]]$moduleLabels
  moduleColors  <- WGCNA_list[[Datasetname]]$moduleColors
  MEs           <- WGCNA_list[[Datasetname]]$MEs
  network       <- WGCNA_list[[Datasetname]]$network
  ps            <- WGCNA_list[[Datasetname]]$ps
  traitData     <- WGCNA_list[[Datasetname]]$traitData

  ModsOfInterst_list[[ paste("ModsOfInterest", Datasetname, sep="_")]] <- list()
  
  ###############################################################################
  #Create Dataframes of significant correlation between RNAModules and TraitData#
  ###############################################################################
  p.value_matr   <- corr.value_matr <- matrix(ncol = ncol(datTraits), 
                                          nrow = ncol(MEs), 
                                          dimnames = list(colnames(MEs), 
                                                          colnames(datTraits )))
      for(ii in 1:ncol(MEs)){
        for(j in 1:ncol(datTraits)){
          cor.res <- cor.test(MEs[,ii], datTraits[,j])
          p.value_matr[ii, j] <- cor.res$p.value
          corr.value_matr[ii, j] <- cor.res$estimate
        }
      }
    
      # Correct for number of tests
      p.value_matr.adjust <- p.adjust(p.value_matr, method = "fdr")
      dim(p.value_matr.adjust) <- dim(p.value_matr)
      dimnames(p.value_matr.adjust) <- list(colnames(MEs), colnames(datTraits))
      
      # Collect all results into a list.
      Traits_corr <- list(p_value =as.data.frame(p.value_matr), 
                            p_value_adj = as.data.frame(p.value_matr.adjust),
                            correlation = as.data.frame(corr.value_matr))
      ModsOfInterst <- list()
        for (iii in names(Traits_corr$correlation)){
          #aa <- length(ModsOfInterst)
          # if (iii %in% c("NONE")) {
          #   A <- Traits_corr$correlation[iii][Traits_corr$correlation[iii] > abs(0.3), , drop=FALSE] 
          #   B <- Traits_corr$p_value_adj[iii][Traits_corr$p_value_adj[iii] < 0.05, , drop = FALSE]
          #   BB <- B[rownames(B) %in% rownames(A), , drop = FALSE]
          #   BBB <- cbind(A[rownames(A) %in% rownames(BB), , drop = FALSE], BB)
          #   names(BBB) <- c("correlation", "p_value_adj")
          # }
          # else {
            A <- Traits_corr$correlation[iii][abs(Traits_corr$correlation[iii]) > 0.3, , drop=FALSE] 
            B <- Traits_corr$p_value_adj[iii][Traits_corr$p_value_adj[iii] < 0.05, , drop = FALSE]
            BB <- B[rownames(B) %in% rownames(A), , drop = FALSE]
            BBB <- cbind(A[rownames(A) %in% rownames(BB), , drop = FALSE], BB)
            names(BBB) <- c("correlation", "p_value_adj")
          # }
          ModsOfInterst[[iii]] <- BBB

          }
    ModsOfInterst <- Filter(function(df) nrow(df) > 0, ModsOfInterst)
    ModsOfInterst_list[[ paste("ModsOfInterest", Datasetname, sep="_")]] <- ModsOfInterst
    WGCNA_list[[Datasetname]][["ModsOfInterst_list"]] <- ModsOfInterst_list 
}


  #############
  #Save as CSV#
  #############
for (Datasetname  in names(WGCNA_list)) {

  data <- WGCNA_list[[Datasetname]]$ModsOfInterst_list[[paste("ModsOfInterest",
                                                                            Datasetname, sep="_")]]
  
  #data.frame(ID = rep(names(data), sapply(data, length)), Obs = unlist(data))
  #data.table::rbindlist(data, idcol="df")
  data <- data  %>% 
  imap_dfr(~ .x %>% 
             mutate(TraitOfInterest = .y))

  
  split_names <- str_split_fixed(rownames(data), "\\.", 2)
  data <- cbind(
    data,
    Module = split_names[, 1])
  rownames(data) <- NULL
  data$MODULE <- sub("ME", "", data$Module)
  data$Mod <- paste(Datasetname, data$MODULE, sep="")
  #data <- data %>% 
  #  arrange(MODULE)
 

  write.csv2(data , file = paste0(file.path(path_Output_16S, 
                paste(paste("SSU_ModulesOfInterst", Datasetname, sep="_"), 
                      "csv", sep="."))))
}

9.3.2 .csv module ASV & GS

for (Datasetname  in names(WGCNA_list)) {
  
  omics_data    <- WGCNA_list[[Datasetname]]$omics_data
  datTraits     <- WGCNA_list[[Datasetname]]$datTraits
  moduleLabels  <- WGCNA_list[[Datasetname]]$moduleLabels
  moduleColors  <- WGCNA_list[[Datasetname]]$moduleColors
  MEs           <- WGCNA_list[[Datasetname]]$MEs
  network       <- WGCNA_list[[Datasetname]]$network
  ps            <- WGCNA_list[[Datasetname]]$ps
  ModsOfInterest <- WGCNA_list[[Datasetname]]$ModsOfInterst_list[[paste("ModsOfInterest",
                                                                            Datasetname, sep="_")]]
  data <-  ModsOfInterest  %>% 
  imap_dfr(~ .x %>% 
             mutate(TraitOfInterest = .y))

  
  split_names <- str_split_fixed(rownames(data), "\\.", 2)
  data <- cbind(
    data,
    Module = split_names[, 1])
  rownames(data) <- NULL
  data$MODULE <- sub("ME", "", data$Module)
  data$Mod <- paste(Datasetname, data$MODULE, sep="")
  #############################################
  #Create Bacterial Counts and Reseq Dataframe#
  #############################################  

  TAX <- as.data.frame(tax_table(ps))
  OTU <- as.data.frame(t(otu_table(ps)))
  REFSEQ <- refseq(ps)
  REFSEQ <- stack(as.character(REFSEQ, use.names=TRUE))
  colnames(REFSEQ)[colnames(REFSEQ) == "ind"] <- "ASV"

  SeasonSums <- ps %>%
    transform_sample_counts(function(x) {x/sum(x)*100}) %>%
    phyloseq::otu_table() %>%
    as.data.frame() %>%
    rownames_to_column(var = "SampleID") %>%
    dplyr::left_join(SAMDF16S, by = c("SampleID" = "SampleID")) %>%
    dplyr::group_by(Season) %>%
    dplyr::summarise(dplyr::across(rownames(tax_table(ps)), mean, na.rm = TRUE)) %>% 
    t() %>%
    as.data.frame() %>%
    `colnames<-`(.[1, ]) %>%
    .[-1, ] %>%
    setNames(paste0("avg_", colnames(.))) %>%
    mutate_all(as.numeric) %>%
    rownames_to_column(var="ASV")

    SSUData  <- TAX %>%  
     rownames_to_column(var = "ASV") %>% 
       left_join( #Add relative ASVmeans 
       data.frame(t(phyloseq::otu_table(ps %>%
        transform_sample_counts(function(x) {x/sum(x)*100})))) %>%
          dplyr::mutate(ASVmeans = rowMeans(.)) %>%
          dplyr::mutate(ASV = rownames(.)) %>% 
          dplyr::select(ASV, ASVmeans)
       ) %>%
    left_join( #Add Sums by Season
        SeasonSums
       ) %>%
    left_join( #Add relative ASV data per sample
        data.frame(t(phyloseq::otu_table(ps %>%
        transform_sample_counts(function(x) {x/sum(x)*100})))) %>%
        rownames_to_column(var = "ASV")
    ) %>%
     left_join(REFSEQ
    ) %>%
      dplyr::arrange(desc(ASVmeans))
   
   rownames(SSUData) <- SSUData$ASV

   write.csv2(SSUData, paste0(file.path(path_Output_16S, paste(paste(save_name, 
            paste("SSU_ALL", sep=""), Date, Datasetname,  sep="_"),".csv", sep=""))))
   
  ############################################################################
  #Create WGCNA Dataframe: Species, ModuleMembership, Correlation to Abiotics#
  ############################################################################
  modNames     <- names(MEs)
  nSamples     <- nrow(omics_data)
  SSUWGCNAlist <- list()
  
  for (i in names(MEs)) {
    
    a <- length(SSUWGCNAlist)
    ModuleOfInterst <- paste(sub("ME", "", i))
  
    geneModuleMembership = as.data.frame(cor(omics_data, MEs, use = "p"));
    MMPvalue = as.data.frame(corPvalueStudent(as.matrix(geneModuleMembership), nSamples));
    names(geneModuleMembership) = paste("MM", modNames, sep="");
    names(MMPvalue) = paste("p.MM", modNames, sep="");
    
    
    DatTraitOfInterest <- as.data.frame(datTraits[data[data$Module %in% i,]$TraitOfInterest])

    GS_list <- list()
      if (ncol(DatTraitOfInterest) > 0) {
        for (TraitOfInterest in names(DatTraitOfInterest)) {
      
        geneTraitSignificance = 
        as.data.frame(WGCNA::cor(omics_data, DatTraitOfInterest[TraitOfInterest], use = "p"));
        GSPvalue = as.data.frame(WGCNA::corPvalueStudent(as.matrix(geneTraitSignificance), nSamples));
        names(geneTraitSignificance) = paste("GS.", names(DatTraitOfInterest[TraitOfInterest]), sep="");
        names(GSPvalue) = paste("p.GS.", names(DatTraitOfInterest[TraitOfInterest]), sep="");
        GS_list[[TraitOfInterest]] <- geneTraitSignificance 
      }
    
      align_rows <- function(lst) {
      common_rows <- Reduce(intersect, lapply(lst, rownames))
      aligned_list <- lapply(lst, function(df) df[common_rows, , drop=FALSE])
      return(aligned_list)
      }
      aligned_GS_list <- align_rows(GS_list)
      combined_df <- do.call(cbind, aligned_GS_list)
    } else {
    cat("'TraitOfInterest' does not exist in the names of DatTraitOfInterest.\n")
    }

    OmicsPerModule <- as.data.frame(names(moduleLabels[moduleLabels == ModuleOfInterst]))
    names(OmicsPerModule) <- paste("ME", ModuleOfInterst, sep="")
    rownames(OmicsPerModule) <- OmicsPerModule[,1]
 
    SSUWGCNAdata <- OmicsPerModule %>% rownames_to_column("RowName") %>%
          left_join(geneModuleMembership[rownames(OmicsPerModule),, drop = FALSE][names(geneModuleMembership) %in%
          paste("MMME", ModuleOfInterst, sep="")] %>% rownames_to_column("RowName"), by = "RowName") %>%
          column_to_rownames("RowName")
    
    
   if (ncol(DatTraitOfInterest) > 0) {
      SSUWGCNAdata1 <- SSUWGCNAdata %>% 
        rownames_to_column("RowName") %>%
        left_join(combined_df %>% 
        rownames_to_column("RowName"))
      SSUWGCNAdata1$ASV <- SSUWGCNAdata1$RowName
      SSUWGCNAdata2 <- SSUWGCNAdata1 %>%
        left_join(SSUData[SSUData$ASV %in% SSUWGCNAdata1$RowName,]) %>%
        dplyr::arrange(desc(ASVmeans)) 

    } else {
        SSUWGCNAdata2 <- SSUWGCNAdata %>%
          mutate(ASV = rownames(.)) %>%
        left_join(SSUData[SSUData$ASV %in% rownames(SSUWGCNAdata),]) %>%
        dplyr::arrange(desc(ASVmeans)) 
    }
  
  # SSUWGCNAdata  <- SSUWGCNAdata  %>% rownames_to_column("RowName") %>%
  #         left_join(geneTraitSignificance[rownames(OmicsPerModule),, drop = FALSE][names(geneTraitSignificance) %in%
  #         paste("GS.", TraitOfInterest, sep="")] %>% rownames_to_column("RowName"), by = "RowName") %>%
  #         column_to_rownames("RowName")
  # 

  
  #Reoder by kME/Module Membership
  #SSUWGCNAdata <- SSUWGCNAdata %>%
  #          dplyr::arrange(desc(SSUWGCNAdata[paste("MMME", ModuleOfInterst, sep="")]))

  SSUWGCNAlist[[a+1]] <- SSUWGCNAdata2
  
  names(SSUWGCNAlist)[[a+1]] <- paste("SSU",ModuleOfInterst, sep="")
  
  write.csv2(SSUWGCNAdata2, paste0(file.path(path_Output_16S, paste(paste(save_name, 
            paste("SSU",ModuleOfInterst, sep=""), Date, Datasetname,  sep="_"),".csv", sep=""))))
  
  }
  
  WGCNA_list[[Datasetname]][["SSUWGCNAlist"]] <- SSUWGCNAlist
}
## 'TraitOfInterest' does not exist in the names of DatTraitOfInterest.
## 'TraitOfInterest' does not exist in the names of DatTraitOfInterest.
## 'TraitOfInterest' does not exist in the names of DatTraitOfInterest.
## 'TraitOfInterest' does not exist in the names of DatTraitOfInterest.
## 'TraitOfInterest' does not exist in the names of DatTraitOfInterest.
## 'TraitOfInterest' does not exist in the names of DatTraitOfInterest.
## 'TraitOfInterest' does not exist in the names of DatTraitOfInterest.
## 'TraitOfInterest' does not exist in the names of DatTraitOfInterest.
## 'TraitOfInterest' does not exist in the names of DatTraitOfInterest.
## 'TraitOfInterest' does not exist in the names of DatTraitOfInterest.
## 'TraitOfInterest' does not exist in the names of DatTraitOfInterest.
## 'TraitOfInterest' does not exist in the names of DatTraitOfInterest.
## 'TraitOfInterest' does not exist in the names of DatTraitOfInterest.
## 'TraitOfInterest' does not exist in the names of DatTraitOfInterest.
## 'TraitOfInterest' does not exist in the names of DatTraitOfInterest.
## 'TraitOfInterest' does not exist in the names of DatTraitOfInterest.
## 'TraitOfInterest' does not exist in the names of DatTraitOfInterest.
## 'TraitOfInterest' does not exist in the names of DatTraitOfInterest.
## 'TraitOfInterest' does not exist in the names of DatTraitOfInterest.
## 'TraitOfInterest' does not exist in the names of DatTraitOfInterest.
## 'TraitOfInterest' does not exist in the names of DatTraitOfInterest.
## 'TraitOfInterest' does not exist in the names of DatTraitOfInterest.
#Export ASVs as Fasta 
# pslist$ps_SL %>%
#       refseq() %>%
#       Biostrings::writeXStringSet(paste0(file.path(path_Output_16S, "SSUWGCNAlist_SL_21.08.23.asv.fna")), append=FALSE,
#                                   compress=FALSE, compression_level=NA, format="fasta")
# fasta_sequences <- Biostrings::readDNAStringSet(paste0(file.path(path_Output_16S, "SSUWGCNAlist_SL_21.08.23.asv.fna")))
# target_names    <- SSUWGCNAlist$SSU8$ASV
# # Subset the sequences based on the target names
# subset_sequences <- fasta_sequences[names(fasta_sequences) %in% target_names]
# Biostrings::writeXStringSet(subset_sequences, paste0(file.path(path_Output_16S, "SSUWGCNAlist_SL_21.08.23_SSU8.asv.fna")))

#######################################
#Extract and Inspect Sequences on NCBI#
#######################################  
  # minTotRelAbun = 0.001
  # x = taxa_sums(pslist$ps_Mock)
  # keepTaxa = which((x / sum(x)) > minTotRelAbun)
  # prunedSet = prune_taxa(as.character(keepTaxa), pslist$ps_Mock)
  # 
  # prunedSet %>%
  #     refseq() %>%
  #     Biostrings::writeXStringSet("~/asv.fna", append=FALSE,
  #                                 compress=FALSE, compression_level=NA, format="fasta")
saveRDS(WGCNA_list, file = paste0(file.path(path_Output_16S, paste(paste(save_name, "WGCNA_list", Date, sep="_"), ".rds", sep=""))))

9.3.3 Integrated Heatmap OE & GC

hmps_list <- list()
for (Datasetname  in names(WGCNA_list)[grepl("OE|GC", names(WGCNA_list))]) {
  
  omics_data    <- WGCNA_list[[Datasetname]]$omics_data
  datTraits     <- WGCNA_list[[Datasetname]]$datTraits
  moduleLabels  <- WGCNA_list[[Datasetname]]$moduleLabels
  moduleColors  <- WGCNA_list[[Datasetname]]$moduleColors
  RNA_MEs       <- WGCNA_list[[Datasetname]]$MEs
    names(RNA_MEs) <- sub("ME", paste(Datasetname), names(RNA_MEs))
  WF_MEs       <- WGCNA_list[["WF"]]$MEs
    names(WF_MEs) <- sub("ME", "WF", names(WF_MEs))
  network       <- WGCNA_list[[Datasetname]]$network
  ps            <- WGCNA_list[[Datasetname]]$ps
  traitData     <- WGCNA_list[[Datasetname]]$traitData
  SAM_MEs       <- data.frame(sample_data(WGCNA_list[[Datasetname]]$ps))
  
  ASVsums <- as.data.frame(sapply(WGCNA_list[[Datasetname]]$SSUWGCNAlist, function(x) sum(x$ASVmeans))) 
  ASVsums<- ASVsums %>% 
    round(0) %>%
    setNames("ASVsum") %>%
    rownames_to_column(var = "Module") %>%
    dplyr::arrange(desc(ASVsum))
  ASVsums$Module <- sub("SSU", paste(Datasetname), ASVsums$Module)


  # WGCNA_NormalHeatmap_RK(
  #   SAM_MEs = SAM_MEs,
  #   SSU_MEs = MEs,
  #   WIDTH = WIDTH,
  #   HEIGHT = HEIGHT,
  #   traitData = traitData)
  # Normal_hmap
  
  ##############################################################
  #Creating artificially expanded Bakterioplankton ME dataframe#
  ##############################################################
  ps_WF <- pslist[[paste("ps", "WF", sep="_")]]
  
  Names_Fish <- substring(sample_data(ps)$SampleID, first = 3)
  NAME_WF <- substring(rownames(WF_MEs), first = 3)
  
  NAME_WF <- gsub("\\d+$", "", NAME_WF)
  NAME_WF <- substr(NAME_WF , 1, nchar(NAME_WF ) - 2)
  
  Names_Fish_with_matching_WF <- Names_Fish[substr(gsub("\\d+$", "", Names_Fish), 1, 
                                 nchar(gsub("\\d+$", "", Names_Fish)) - 2) %in% NAME_WF]
  print("These Fish samples have real matching Waterfilters from the same Catch")
  print(Names_Fish_with_matching_WF)
  
  print("These will be created from related catches")
  print(setdiff(Names_Fish, Names_Fish_with_matching_WF))
  
  # matching_WF <- sample_data(ps) |>
  #   mutate(Names_Fish = gsub("\\d+$", "", substring(SampleID, first = 3))) |>
  #   mutate(Names_Fish = substr(Names_Fish , 1, nchar(Names_Fish ) - 2))
  # matching_WF <- matching_WF[matching_WF$Names_Fish %in% NAME_WF]

  WF_ME_Expanded <- list()
    for (i in seq_along(Names_Fish)) {
        NAME  <- Names_Fish[i]
        NAME2 <- gsub("\\d+$", "", NAME)
        NAME3 <- substr(NAME2 , 1, nchar(NAME2 ) - 2)
        NAME4 <- substr(NAME3 , 1, nchar(NAME3 ) - 2)

        if (paste("WF", NAME3, "EB", sep="") %in% rownames(WF_MEs)) {
          WF_ME_Expanded[[i]] <- WF_MEs[rownames(WF_MEs) %in% paste("WF", NAME3, "EB", sep=""),]
        } else if (paste("WF", NAME3, "FL", sep="") %in% rownames(WF_MEs)){
          WF_ME_Expanded[[i]] <- WF_MEs[rownames(WF_MEs) %in% paste("WF", NAME3, "FL", sep=""),]
          
        } else if (paste("WF", NAME4, "TWEB", sep="") %in% rownames(WF_MEs)){
          WF_ME_Expanded[[i]] <- WF_MEs[rownames(WF_MEs) %in% paste("WF", NAME4, "TWEB", sep=""),]
          
        } else if (paste("WF", NAME4, "MLFL", sep="") %in% rownames(WF_MEs)){
          WF_ME_Expanded[[i]] <- WF_MEs[rownames(WF_MEs) %in% paste("WF", NAME4, "MLFL", sep=""),]
          
        } else {print(NA)}
        names(WF_ME_Expanded)[[i]] <- paste("WF", NAME, sep="")
  }
  WF_ME_Expanded_df <- do.call(rbind, WF_ME_Expanded)  
  
  ##############################
  #Creating Integraded heatmaps#
  ##############################
  hmps_length <- length(hmps_list)
  FILENAME    <- paste(paste("IntegratedHeatmap", Datasetname, Date, sep="_"),".png", sep="")
        
      WGCNA_IntegratedHeatmap_RK(
        SAM_MEs =  SAM_MEs,
        RNA_MEs =  RNA_MEs,#[names(RNA_MEs) != paste(Datasetname, "0", sep="")],
        SSU_MEs =  WF_ME_Expanded_df[names(  WF_ME_Expanded_df) != "WF0"], 
        WIDTH     = 16,
        HEIGHT  = 5,
        OutlineColor = "grey20",
        traitData = traitData)
        
     hmps_list[[hmps_length+1]] <- ht_list
     names(hmps_list)[[hmps_length+1]] <- paste("ht_list", Datasetname, sep="_")
     

    prow <- cowplot::plot_grid(grid.grabExpr(ComplexHeatmap::draw(ht_list, auto_adjust = FALSE, 
                    background = "transparent",
                   heatmap_legend_side = "left", annotation_legend_side = "bottom")))
    title <- ggdraw() 
    subtitle <- ggdraw() + draw_label_themeRKwhite(paste("Heatmap MEs", sep = " "), 
              element = "plot.subtitle",x = 0.05, hjust = 0, vjust = 1)
    b <- plot_grid(title, subtitle, prow, ncol = 1, rel_heights = c(0.04, 0.05, 0.98)) 
    #Safe the plots in a specified folder <- may have to change the width and height

    plot(prow)
     
}
## [1] "These Fish samples have real matching Waterfilters from the same Catch"
##  [1] "SU21MGEB1" "SU21MGEB4" "SU21MGEB6" "SU21MGEB8" "SU21MGEB9" "SU21MGFL2"
##  [7] "SU21MGFL6" "SU21BBEB5" "SU21BBEB6" "SU21BBEB7" "SU21BBEB8" "SU21BBEB9"
## [13] "SU21BBFL1" "SU21BBFL4" "SU21SSEB6" "SU21SSEB7" "SU21SSEB8" "SU21SSEB9"
## [19] "SU21SSFL1" "SU21SSFL2" "SU21SSFL6" "SU21MLEB1" "SU21MLEB2" "SU21MLEB3"
## [25] "SU21MLFL1" "SU21MLFL2" "AU21TWEB1" "AU21TWEB3" "AU21TWEB5" "AU21TWEB7"
## [31] "AU21TWFL1" "AU21TWFL3" "AU21TWFL7" "WI22BBEB1" "WI22BBEB4" "WI22BBEB5"
## [37] "WI22BBEB7" "WI22BBFL1" "WI22BBFL3" "WI22BBFL7" "WI22MGEB1" "WI22MGEB3"
## [43] "WI22MGEB5" "WI22MGEB9" "WI22MGFL1" "WI22MGFL3" "WI22MGFL7" "WI22SSEB1"
## [49] "WI22SSEB5" "WI22SSEB7" "WI22SSEB9" "WI22SSFL1" "WI22SSFL5" "WI22SSFL8"
## [55] "WI22MLEB1" "WI22MLEB3" "WI22MLEB7" "WI22MLEB9" "WI22MLFL1" "WI22MLFL5"
## [61] "WI22MLFL7" "SP22MGEB1" "SP22MGEB3" "SP22MGEB5" "SP22MGEB7" "SP22MGFL1"
## [67] "SP22MGFL3" "SP22MGFL5" "SP22BBEB1" "SP22BBEB3" "SP22BBEB7" "SP22BBEB8"
## [73] "SP22BBFL2" "SP22BBFL3" "SP22BBFL5" "SP22SSEB1" "SP22SSEB2" "SP22SSEB3"
## [79] "SP22SSEB4" "SP22SSFL1" "SP22SSFL3" "SP22SSFL5" "SP22MLEB1" "SP22MLEB2"
## [85] "SP22MLEB3" "SP22MLFL1" "SP22MLFL2" "SP22MLFL3" "SP22MLFL4"
## [1] "These will be created from related catches"
##  [1] "SU21TWEB2" "SU21TWEB3" "SU21TWEB4" "SU21TWFL1" "SU21TWFL3" "SU21TWFL5"
##  [7] "SU21TWFL7" "AU21MGEB1" "AU21MGEB3" "AU21MGEB5" "AU21MGEB7" "AU21MGFL1"
## [13] "AU21MGFL3" "AU21MGFL5" "AU21BBEB1" "AU21BBEB3" "AU21BBEB6" "AU21BBEB7"
## [19] "AU21BBFL2" "AU21BBFL4" "AU21BBFL6" "AU21SSEB1" "AU21SSEB3" "AU21SSEB5"
## [25] "AU21SSEB8" "AU21SSFL1" "AU21SSFL3" "AU21SSFL6" "AU21MLEB1" "AU21MLEB3"
## [31] "AU21MLEB5" "AU21MLEB7" "AU21MLFL1" "AU21MLFL3" "AU21MLFL7" "WI22TWEB1"
## [37] "WI22TWEB2" "WI22TWEB3" "WI22TWEB4" "WI22TWEB5" "WI22TWEB7" "WI22TWEB9"
## [43] "SP22TWEB1" "SP22TWEB2" "SP22TWEB5" "SP22TWFL1" "SP22TWFL3" "SP22TWFL5"
## [49] "SP22TWFL7"
## [1] "These Fish samples have real matching Waterfilters from the same Catch"
##  [1] "SU21BBEB1" "SU21BBEB2" "SU21BBEB4" "SU21BBEB5" "SU21BBFL3" "SU21SSEB1"
##  [7] "SU21SSEB2" "SU21SSEB3" "SU21SSEB4" "SU21SSEB8" "SU21SSFL1" "SU21SSFL3"
## [13] "SU21MLEB1" "SU21MLEB2" "SU21MLEB3" "SU21MLEB5" "SU21MLEB6" "SU21MLFL1"
## [19] "SU21MLFL2" "SU21MLFL3" "SU21MLFL6" "AU21TWEB1" "AU21TWFL1" "AU21TWFL2"
## [25] "WI22BBEB1" "WI22BBFL1" "WI22BBFL2" "WI22SSEB1" "WI22SSEB2" "WI22SSEB3"
## [31] "WI22SSFL1" "WI22SSFL2" "WI22MLEB1" "WI22MLFL1" "WI22MLFL2" "SP22BBEB1"
## [37] "SP22BBEB3" "SP22BBEB5" "SP22BBEB7" "SP22BBEB9" "SP22BBFL1" "SP22BBFL2"
## [43] "SP22SSEB1" "SP22SSEB3" "SP22SSEB5" "SP22SSEB7" "SP22SSFL1" "SP22SSFL3"
## [49] "SP22SSFL5" "SP22MLEB1" "SP22MLEB3" "SP22MLEB5" "SP22MLEB7" "SP22MLFL1"
## [55] "SP22MLFL3" "SP22MLFL5"
## [1] "These will be created from related catches"
##  [1] "SU21TWEB1" "SU21TWEB2" "SU21TWEB3" "SU21TWEB4" "SU21TWFL1" "SU21TWFL2"
##  [7] "SU21TWFL3" "AU21BBEB1" "AU21BBEB2" "AU21BBEB3" "AU21BBFL1" "AU21BBFL2"
## [13] "AU21BBFL3" "AU21BBFL4" "AU21SSEB1" "AU21SSEB3" "AU21SSEB5" "AU21SSEB7"
## [19] "AU21SSFL1" "AU21SSFL3" "AU21SSFL5" "AU21MLEB1" "AU21MLEB2" "AU21MLEB3"
## [25] "AU21MLFL1" "AU21MLFL3" "AU21MLFL5" "AU21MLFL7" "SP22TWEB1" "SP22TWFL1"
## [31] "SP22TWFL2" "SP22TWFL3"

9.3.3 Integrated Heatmap Bakterioplankton

for (Datasetname  in names(WGCNA_list)[grepl("WF", names(WGCNA_list))]) {
   
  omics_data    <- WGCNA_list[[Datasetname]]$omics_data
  datTraits     <- WGCNA_list[[Datasetname]]$datTraits
  moduleLabels  <- WGCNA_list[[Datasetname]]$moduleLabels
  moduleColors  <- WGCNA_list[[Datasetname]]$moduleColors
  RNA_MEs       <- WGCNA_list[[Datasetname]]$MEs 
    names(RNA_MEs) <- sub("ME", "WF", names(RNA_MEs))
  
  network       <- WGCNA_list[[Datasetname]]$network
  ps            <- WGCNA_list[[Datasetname]]$ps
  traitData     <- WGCNA_list[[Datasetname]]$traitData
  SAM_MEs       <- data.frame(sample_data(WGCNA_list[[Datasetname]]$ps))
  
  ASVsums <- as.data.frame(sapply(WGCNA_list[[Datasetname]]$SSUWGCNAlist, function(x) sum(x$ASVmeans))) 
  ASVsums<- ASVsums %>% 
    round(0) %>%
    setNames("ASVsum") %>%
    rownames_to_column(var = "Module") %>%
    dplyr::arrange(desc(ASVsum))
  ASVsums$Module <- sub("SSU", paste(Datasetname), ASVsums$Module)
  
  ##############################
  #Creating Integraded heatmaps#
  ##############################
  hmps_length <- length(hmps_list)
  FILENAME    <- paste(paste("IntegratedHeatmap", Datasetname, Date, sep="_"),".png", sep="")
  
  WGCNA_IntegratedHeatmap_Bakterioplankton_RK (
        SAM_MEs =  SAM_MEs,
        RNA_MEs =  RNA_MEs,#[names(RNA_MEs) != "WF0"],
        WIDTH     = 10,
        HEIGHT  = 5,
        OutlineColor = "grey20",
        traitData = traitData)

  hmps_list[[hmps_length+1]] <- ht_list
  names(hmps_list)[[hmps_length+1]] <- paste("ht_list", Datasetname, sep="_")
  
  
  prow <- cowplot::plot_grid(grid.grabExpr(ComplexHeatmap::draw(ht_list, auto_adjust = FALSE, 
                    background = "transparent",
                   heatmap_legend_side = "left", annotation_legend_side = "bottom")))
    title <- ggdraw() 
    subtitle <- ggdraw() + draw_label_themeRKwhite(paste("Heatmap MEs", sep = " "), 
              element = "plot.subtitle",x = 0.05, hjust = 0, vjust = 1)
    b <- plot_grid(title, subtitle, prow, ncol = 1, rel_heights = c(0.04, 0.05, 0.98)) 
    #Safe the plots in a specified folder <- may have to change the width and height

    plot(prow)
}

9.3.3.1 Module Top Barplot

 ASVsums <- as.data.frame(sapply(WGCNA_list[[Datasetname]]$SSUWGCNAlist, function(x) sum(x$ASVmeans))) 
  ASVsums<- ASVsums %>% 
    round(0) %>%
    setNames("ASVsum") %>%
    rownames_to_column(var = "Module") %>%
    dplyr::arrange(desc(ASVsum))
  ASVsums$Module <- sub("SSU", paste(Datasetname), ASVsums$Module)

  head(WGCNA_list[[Datasetname]]$SSUWGCNAlist$SSU2$ASVmeans, 5)
## ASV883:Limnohabitans.curvus      ASV32:Polynucleobacter 
##                   0.3236509                   0.2902560 
##         ASV1181:Polaromonas        ASV814:Limnohabitans 
##                   0.2432417                   0.1937816 
##       ASV1318:Limnohabitans 
##                   0.1697100
Module_Top5_Barplot_list <- list() 
bars_list <- list()
for (Datasetname in names(WGCNA_list)[grepl("OE|GC|WF", names(WGCNA_list))]) {
  #Datasetname <- sub("ps_","", Dataset)
  
  ps <- pslist[[paste("ps", Datasetname, sep="_")]]
  
  Module_Top5_Barplot_list[[Datasetname]] <- list()
  bars_list[[Datasetname]] <- list()
  
  for (Interest_ME  in names(WGCNA_list[[Datasetname]]$SSUWGCNAlist)) {

    ################################################
    Interest_ME_dat <- WGCNA_list[[Datasetname]]$SSUWGCNAlist[[paste(Interest_ME,
                                sep="")]]
    
    print(paste("rel ASVmeans in Module", Interest_ME))

      data <- head(
      Interest_ME_dat %>%
        dplyr::mutate(total_sum_ASVmeans = sum(ASVmeans, na.rm = TRUE)) %>%
        dplyr::group_by(ASV) %>%
        dplyr::mutate(sum_ASVmeans = sum(ASVmeans, na.rm = TRUE)) %>%
        dplyr::summarize(rel_ASVmeans = round(sum(sum_ASVmeans) / sum(total_sum_ASVmeans) * 100,
                                              digits=2))%>%
        dplyr::arrange(desc(rel_ASVmeans)) %>% 
        as.data.frame() %>% 
        left_join(Interest_ME_dat[c("Phylum", "Class", "Order", "Family", "Genus", "Species", "ASV")])
      , 5)
     print(head(data, 2))
     
     data$ASVcut <-  sub(".*:", "", data$ASV)
     result <- data %>%
       dplyr::group_by(ASVcut) %>%
       dplyr::summarise(rel_ASVmeans = sum(rel_ASVmeans, na.rm = TRUE),
            Phylum = dplyr::first(Phylum),
            Class = dplyr::first(Class),
            Order = dplyr::first(Order),
            Family = dplyr::first(Family),
            Genus = dplyr::first(Genus),
            Species = dplyr::first(Species))
 
    p <- ggplot(result, aes(x = ASVcut, y = rel_ASVmeans, fill = Phylum)) +
    geom_bar(stat = "identity", alpha = 0.7) +
    theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
    labs(title = "Relative ASV Means", x = "ASV", y = "Relative ASV Means") +
    scale_fill_manual(values = phylum_colors_Cytoscape) +
  #scale_fill_brewer(palette = "Set3") +
     #       annotate("text", x = x_pos, y = y_pos, label = paste(sub("SSU", Datasetname, Interest_ME)), hjust = 0, vjust = 1.5, size = 8, color = "black", fontface = "bold") +
    #       annotate("text", x = Inf, y = Inf, label = paste(sub("SSU", Datasetname, Interest_ME)), hjust = -0.1, vjust = 1.5) +
      coord_flip() +
      theme(panel.border = element_rect(color = "black", fill = NA, size = 1)) 

    
 
    
    y_start <- ifelse(range(result$rel_ASVmeans) >= 0 & range(result$rel_ASVmeans) <= 5, 2, 
                  ifelse(range(result$rel_ASVmeans) > 5 & range(result$rel_ASVmeans) <= 10, 4.5, 
                          ifelse(range(result$rel_ASVmeans) > 10 & range(result$rel_ASVmeans) <= 20, 7, 
                                  ifelse(range(result$rel_ASVmeans) > 20 & range(result$rel_ASVmeans) <= 50, 18, 
                                         ifelse(range(result$rel_ASVmeans) > 50 & range(result$rel_ASVmeans) <= 100, 30,
                          NA)))))
                  


# Add y-axis labels as annotations starting at the calculated position
  p <-  p + geom_text(aes(label = ASVcut, y = y_start[2]), vjust = 1, size = 3.8, fontface='bold') +
      #p <- p + geom_text(aes(label = ASVcut, y = 5), vjust = 1, size = 4) +
      #geom_text(aes(label = ASVcut), vjust = -0.5, size = 3) +
  theme(axis.title.y=element_blank(),
        axis.text.y=element_blank(),
        axis.ticks.y=element_blank()) +

      theme(strip.text.y = element_text(angle = 0))  +
        theme(
         panel.background = element_rect(fill='transparent'), #transparent panel bg
         plot.background = element_rect(fill='transparent', color=NA), #transparent plot bg
         #legend.background = element_rect(fill='transparent'), #transparent legend bg
         #legend.box.background = element_rect(fill='transparent') #transparent legend panel
         ) +
      #theme(#axis.title.x.bottom =  element_text(color="grey13"), 
      #  strip.text = element_text(color = "black", face= "bold")) +
      theme(
        #legend.title=element_text(size=12), 
        #legend.text=element_text(size=12), 
        axis.text.x.bottom = element_text(size= 12, face = "bold", angle = 45, hjust = 1))+
        #strip.text.y.left = element_text(size = 12,face = "bold"), 
        #axis.title.y.left = element_text(size=12,face = "bold")) +
        theme(legend.position = "none") +
      #ylab("ASV mean [%]") +
      labs(title = "", x = "", y = "") #module ASV mean [%]

    
    prow <- cowplot::plot_grid(p, labels = c(""), ncol = 1)
    Module_Top5_Barplot_list[[Datasetname]][[Interest_ME]] <- prow
    bars_list[[Datasetname]][[Interest_ME]] <- data.frame(SSU = Interest_ME, bars = dim(result)[1])
  }
}
## [1] "rel ASVmeans in Module SSU8"
##                      ASV rel_ASVmeans         Phylum               Class
## 1 ASV113:Elizabethkingia        29.03   Bacteroidota         Bacteroidia
## 2     ASV351:Citrobacter         8.84 Proteobacteria Gammaproteobacteria
##              Order             Family           Genus Species
## 1 Flavobacteriales      Weeksellaceae Elizabethkingia    <NA>
## 2 Enterobacterales Enterobacteriaceae     Citrobacter    <NA>
## [1] "rel ASVmeans in Module SSU13"
##                      ASV rel_ASVmeans       Phylum       Class            Order
## 1 ASV525:Elizabethkingia        33.45 Bacteroidota Bacteroidia Flavobacteriales
## 2 ASV687:Elizabethkingia        21.91 Bacteroidota Bacteroidia Flavobacteriales
##          Family           Genus Species
## 1 Weeksellaceae Elizabethkingia    <NA>
## 2 Weeksellaceae Elizabethkingia    <NA>
## [1] "rel ASVmeans in Module SSU6"
##                    ASV rel_ASVmeans         Phylum               Class
## 1 ASV1:Elizabethkingia        40.94   Bacteroidota         Bacteroidia
## 2     ASV2:Citrobacter         7.90 Proteobacteria Gammaproteobacteria
##              Order             Family           Genus Species
## 1 Flavobacteriales      Weeksellaceae Elizabethkingia    <NA>
## 2 Enterobacterales Enterobacteriaceae     Citrobacter    <NA>
## [1] "rel ASVmeans in Module SSU12"
##                        ASV rel_ASVmeans         Phylum               Class
## 1        ASV79:Citrobacter        22.01 Proteobacteria Gammaproteobacteria
## 2 ASV85:Enterobacteriaceae        20.40 Proteobacteria Gammaproteobacteria
##              Order             Family       Genus Species
## 1 Enterobacterales Enterobacteriaceae Citrobacter    <NA>
## 2 Enterobacterales Enterobacteriaceae        <NA>    <NA>
## [1] "rel ASVmeans in Module SSU3"
##                    ASV rel_ASVmeans            Phylum            Class
## 1 ASV43:Persicirhabdus        13.44 Verrucomicrobiota Verrucomicrobiae
## 2 ASV93:Persicirhabdus         7.30 Verrucomicrobiota Verrucomicrobiae
##                Order         Family          Genus Species
## 1 Verrucomicrobiales Rubritaleaceae Persicirhabdus    <NA>
## 2 Verrucomicrobiales Rubritaleaceae Persicirhabdus    <NA>
## [1] "rel ASVmeans in Module SSU10"
##                                 ASV rel_ASVmeans         Phylum
## 1 ASV18:Clostridium sensu stricto 1        35.91     Firmicutes
## 2              ASV62:Photobacterium        16.43 Proteobacteria
##                 Class            Order         Family
## 1          Clostridia    Clostridiales Clostridiaceae
## 2 Gammaproteobacteria Enterobacterales   Vibrionaceae
##                         Genus Species
## 1 Clostridium sensu stricto 1    <NA>
## 2              Photobacterium    <NA>
## [1] "rel ASVmeans in Module SSU2"
##                      ASV rel_ASVmeans         Phylum               Class
## 1      ASV65:Pseudomonas        13.70 Proteobacteria Gammaproteobacteria
## 2 ASV110:Elizabethkingia         6.79   Bacteroidota         Bacteroidia
##              Order           Family           Genus Species
## 1  Pseudomonadales Pseudomonadaceae     Pseudomonas    <NA>
## 2 Flavobacteriales    Weeksellaceae Elizabethkingia    <NA>
## [1] "rel ASVmeans in Module SSU11"
##                     ASV rel_ASVmeans            Phylum               Class
## 1  ASV245:Luteolibacter        29.44 Verrucomicrobiota    Verrucomicrobiae
## 2 ASV406:Unknown Family        19.56    Proteobacteria Gammaproteobacteria
##                                Order         Family         Genus Species
## 1                 Verrucomicrobiales Rubritaleaceae Luteolibacter    <NA>
## 2 Gammaproteobacteria Incertae Sedis Unknown Family          <NA>    <NA>
## [1] "rel ASVmeans in Module SSU5"
##                   ASV rel_ASVmeans         Phylum               Class
## 1  ASV36:Alkanindiges         9.89 Proteobacteria Gammaproteobacteria
## 2 ASV27:Psychrobacter         9.39 Proteobacteria Gammaproteobacteria
##             Order        Family         Genus Species
## 1 Pseudomonadales Moraxellaceae  Alkanindiges    <NA>
## 2 Pseudomonadales Moraxellaceae Psychrobacter    <NA>
## [1] "rel ASVmeans in Module SSU7"
##                           ASV rel_ASVmeans         Phylum               Class
## 1 ASV13:Acinetobacter.lwoffii        28.59 Proteobacteria Gammaproteobacteria
## 2         ASV24:Acinetobacter        12.42 Proteobacteria Gammaproteobacteria
##             Order        Family         Genus Species
## 1 Pseudomonadales Moraxellaceae Acinetobacter lwoffii
## 2 Pseudomonadales Moraxellaceae Acinetobacter    <NA>
## [1] "rel ASVmeans in Module SSU4"
##                        ASV rel_ASVmeans         Phylum               Class
## 1    ASV145:Hyphomicrobium         4.08 Proteobacteria Alphaproteobacteria
## 2 ASV150:Xanthobacteraceae         3.45 Proteobacteria Alphaproteobacteria
##         Order            Family          Genus Species
## 1 Rhizobiales Hyphomicrobiaceae Hyphomicrobium    <NA>
## 2 Rhizobiales Xanthobacteraceae           <NA>    <NA>
## [1] "rel ASVmeans in Module SSU1"
##                      ASV rel_ASVmeans       Phylum       Class            Order
## 1    ASV77:Weeksellaceae         4.84 Bacteroidota Bacteroidia Flavobacteriales
## 2 ASV90:Chryseobacterium         4.72 Bacteroidota Bacteroidia Flavobacteriales
##          Family            Genus Species
## 1 Weeksellaceae             <NA>    <NA>
## 2 Weeksellaceae Chryseobacterium    <NA>
## [1] "rel ASVmeans in Module SSU9"
##                   ASV rel_ASVmeans           Phylum               Class
## 1  ASV12:Arthrobacter        33.51 Actinobacteriota      Actinobacteria
## 2 ASV29:Psychrobacter        18.36   Proteobacteria Gammaproteobacteria
##             Order         Family         Genus Species
## 1   Micrococcales Micrococcaceae  Arthrobacter    <NA>
## 2 Pseudomonadales  Moraxellaceae Psychrobacter    <NA>
## [1] "rel ASVmeans in Module SSU0"
##                    ASV rel_ASVmeans            Phylum            Class
## 1  ASV11:Luteolibacter         8.39 Verrucomicrobiota Verrucomicrobiae
## 2 ASV16:Asinibacterium         4.45      Bacteroidota      Bacteroidia
##                Order           Family          Genus Species
## 1 Verrucomicrobiales   Rubritaleaceae  Luteolibacter    <NA>
## 2    Chitinophagales Chitinophagaceae Asinibacterium    <NA>
## [1] "rel ASVmeans in Module SSU3"
##                        ASV rel_ASVmeans         Phylum               Class
## 1           ASV176:SC-I-84         3.96 Proteobacteria Gammaproteobacteria
## 2 ASV150:Xanthobacteraceae         3.81 Proteobacteria Alphaproteobacteria
##             Order            Family Genus Species
## 1 Burkholderiales           SC-I-84  <NA>    <NA>
## 2     Rhizobiales Xanthobacteraceae  <NA>    <NA>
## [1] "rel ASVmeans in Module SSU5"
##                      ASV rel_ASVmeans       Phylum       Class            Order
## 1 ASV31:Chryseobacterium         6.19 Bacteroidota Bacteroidia Flavobacteriales
## 2      ASV48:Deinococcus         5.72 Deinococcota  Deinococci    Deinococcales
##           Family            Genus Species
## 1  Weeksellaceae Chryseobacterium    <NA>
## 2 Deinococcaceae      Deinococcus    <NA>
## [1] "rel ASVmeans in Module SSU9"
##                         ASV rel_ASVmeans         Phylum               Class
## 1 ASV291:Altererythrobacter         6.86 Proteobacteria Alphaproteobacteria
## 2  ASV478:Sphingomonadaceae         5.03 Proteobacteria Alphaproteobacteria
##              Order            Family              Genus Species
## 1 Sphingomonadales Sphingomonadaceae Altererythrobacter    <NA>
## 2 Sphingomonadales Sphingomonadaceae               <NA>    <NA>
## [1] "rel ASVmeans in Module SSU6"
##                   ASV rel_ASVmeans         Phylum               Class
## 1 ASV162:Sphingomonas         6.24 Proteobacteria Alphaproteobacteria
## 2  ASV178:Deinococcus         5.60   Deinococcota          Deinococci
##              Order            Family        Genus Species
## 1 Sphingomonadales Sphingomonadaceae Sphingomonas    <NA>
## 2    Deinococcales    Deinococcaceae  Deinococcus    <NA>
## [1] "rel ASVmeans in Module SSU10"
##                      ASV rel_ASVmeans         Phylum               Class
## 1    ASV39:Dinghuibacter        17.33   Bacteroidota         Bacteroidia
## 2 ASV82:Caulobacteraceae        14.20 Proteobacteria Alphaproteobacteria
##             Order           Family         Genus Species
## 1 Chitinophagales Chitinophagaceae Dinghuibacter    <NA>
## 2 Caulobacterales Caulobacteraceae          <NA>    <NA>
## [1] "rel ASVmeans in Module SSU7"
##                    ASV rel_ASVmeans           Phylum               Class
## 1   ASV238:Halioglobus         5.24   Proteobacteria Gammaproteobacteria
## 2 ASV289:Ilumatobacter         4.51 Actinobacteriota      Acidimicrobiia
##             Order             Family         Genus Species
## 1 Pseudomonadales         Halieaceae   Halioglobus    <NA>
## 2  Microtrichales Ilumatobacteraceae Ilumatobacter    <NA>
## [1] "rel ASVmeans in Module SSU8"
##                    ASV rel_ASVmeans         Phylum               Class
## 1 ASV1:Elizabethkingia        37.29   Bacteroidota         Bacteroidia
## 2     ASV2:Citrobacter         8.70 Proteobacteria Gammaproteobacteria
##              Order             Family           Genus Species
## 1 Flavobacteriales      Weeksellaceae Elizabethkingia    <NA>
## 2 Enterobacterales Enterobacteriaceae     Citrobacter    <NA>
## [1] "rel ASVmeans in Module SSU16"
##                         ASV rel_ASVmeans         Phylum               Class
## 1        ASV971:Citrobacter        17.33 Proteobacteria Gammaproteobacteria
## 2 ASV968:Enterobacteriaceae        17.08 Proteobacteria Gammaproteobacteria
##              Order             Family       Genus Species
## 1 Enterobacterales Enterobacteriaceae Citrobacter    <NA>
## 2 Enterobacterales Enterobacteriaceae        <NA>    <NA>
## [1] "rel ASVmeans in Module SSU4"
##                   ASV rel_ASVmeans           Phylum               Class
## 1  ASV12:Arthrobacter        21.04 Actinobacteriota      Actinobacteria
## 2 ASV34:Psychrobacter         7.36   Proteobacteria Gammaproteobacteria
##             Order         Family         Genus Species
## 1   Micrococcales Micrococcaceae  Arthrobacter    <NA>
## 2 Pseudomonadales  Moraxellaceae Psychrobacter    <NA>
## [1] "rel ASVmeans in Module SSU14"
##                        ASV rel_ASVmeans         Phylum               Class
## 1        ASV79:Citrobacter        20.61 Proteobacteria Gammaproteobacteria
## 2 ASV85:Enterobacteriaceae        20.32 Proteobacteria Gammaproteobacteria
##              Order             Family       Genus Species
## 1 Enterobacterales Enterobacteriaceae Citrobacter    <NA>
## 2 Enterobacterales Enterobacteriaceae        <NA>    <NA>
## [1] "rel ASVmeans in Module SSU13"
##                        ASV rel_ASVmeans       Phylum       Class
## 1 ASV1627:Chryseobacterium        21.29 Bacteroidota Bacteroidia
## 2      ASV2217:Deinococcus        19.17 Deinococcota  Deinococci
##              Order         Family            Genus Species
## 1 Flavobacteriales  Weeksellaceae Chryseobacterium    <NA>
## 2    Deinococcales Deinococcaceae      Deinococcus    <NA>
## [1] "rel ASVmeans in Module SSU2"
##                           ASV rel_ASVmeans         Phylum               Class
## 1 ASV13:Acinetobacter.lwoffii        20.16 Proteobacteria Gammaproteobacteria
## 2         ASV24:Acinetobacter         9.96 Proteobacteria Gammaproteobacteria
##             Order        Family         Genus Species
## 1 Pseudomonadales Moraxellaceae Acinetobacter lwoffii
## 2 Pseudomonadales Moraxellaceae Acinetobacter    <NA>
## [1] "rel ASVmeans in Module SSU15"
##                           ASV rel_ASVmeans         Phylum               Class
## 1         ASV1245:Pseudomonas        15.65 Proteobacteria Gammaproteobacteria
## 2 ASV822:Microcystis PCC-7914        14.48  Cyanobacteria      Cyanobacteriia
##              Order           Family                Genus Species
## 1  Pseudomonadales Pseudomonadaceae          Pseudomonas    <NA>
## 2 Cyanobacteriales   Microcystaceae Microcystis PCC-7914    <NA>
## [1] "rel ASVmeans in Module SSU11"
##                      ASV rel_ASVmeans         Phylum               Class
## 1      ASV15:Verticiella        28.09 Proteobacteria Gammaproteobacteria
## 2 ASV32:Polynucleobacter        12.65 Proteobacteria Gammaproteobacteria
##             Order           Family            Genus Species
## 1 Burkholderiales   Alcaligenaceae      Verticiella    <NA>
## 2 Burkholderiales Burkholderiaceae Polynucleobacter    <NA>
## [1] "rel ASVmeans in Module SSU1"
##                                 ASV rel_ASVmeans     Phylum      Class
## 1 ASV28:Clostridium sensu stricto 1        20.72 Firmicutes Clostridia
## 2 ASV60:Clostridium sensu stricto 1         5.10 Firmicutes Clostridia
##           Order         Family                       Genus Species
## 1 Clostridiales Clostridiaceae Clostridium sensu stricto 1    <NA>
## 2 Clostridiales Clostridiaceae Clostridium sensu stricto 1    <NA>
## [1] "rel ASVmeans in Module SSU12"
##                      ASV rel_ASVmeans         Phylum               Class
## 1      ASV67:Pseudomonas        34.14 Proteobacteria Gammaproteobacteria
## 2 ASV224:Elizabethkingia        30.77   Bacteroidota         Bacteroidia
##              Order           Family           Genus Species
## 1  Pseudomonadales Pseudomonadaceae     Pseudomonas    <NA>
## 2 Flavobacteriales    Weeksellaceae Elizabethkingia    <NA>
## [1] "rel ASVmeans in Module SSU0"
##                                 ASV rel_ASVmeans         Phylum
## 1 ASV18:Clostridium sensu stricto 1         5.30     Firmicutes
## 2            ASV66:Rhodobacteraceae         5.02 Proteobacteria
##                 Class           Order           Family
## 1          Clostridia   Clostridiales   Clostridiaceae
## 2 Alphaproteobacteria Rhodobacterales Rhodobacteraceae
##                         Genus Species
## 1 Clostridium sensu stricto 1    <NA>
## 2                        <NA>    <NA>
## [1] "rel ASVmeans in Module SSU9"
##                    ASV rel_ASVmeans            Phylum            Class
## 1    ASV357:hgcI clade        27.97  Actinobacteriota   Actinobacteria
## 2 ASV115:Luteolibacter        23.29 Verrucomicrobiota Verrucomicrobiae
##                Order          Family         Genus Species
## 1         Frankiales Sporichthyaceae    hgcI clade    <NA>
## 2 Verrucomicrobiales  Rubritaleaceae Luteolibacter    <NA>
## [1] "rel ASVmeans in Module SSU17"
##                        ASV rel_ASVmeans         Phylum               Class
## 1  ASV4933:Ferruginibacter        14.82   Bacteroidota         Bacteroidia
## 2 ASV5485:Polynucleobacter        12.33 Proteobacteria Gammaproteobacteria
##             Order           Family            Genus Species
## 1 Chitinophagales Chitinophagaceae  Ferruginibacter    <NA>
## 2 Burkholderiales Burkholderiaceae Polynucleobacter    <NA>
## [1] "rel ASVmeans in Module SSU4"
##                           ASV rel_ASVmeans           Phylum          Class
## 1       ASV94:Sporichthyaceae        17.97 Actinobacteriota Actinobacteria
## 2 ASV72:CL500-29 marine group        12.38 Actinobacteriota Acidimicrobiia
##            Order             Family                 Genus Species
## 1     Frankiales    Sporichthyaceae                  <NA>    <NA>
## 2 Microtrichales Ilumatobacteraceae CL500-29 marine group    <NA>
## [1] "rel ASVmeans in Module SSU16"
##                              ASV rel_ASVmeans         Phylum
## 1         ASV549:Pseudohongiella        20.94 Proteobacteria
## 2 ASV480:Candidatus Symbiobacter        17.38 Proteobacteria
##                 Class           Order              Family
## 1 Gammaproteobacteria Pseudomonadales Pseudohongiellaceae
## 2 Gammaproteobacteria Burkholderiales      Comamonadaceae
##                     Genus Species
## 1         Pseudohongiella    <NA>
## 2 Candidatus Symbiobacter    <NA>
## [1] "rel ASVmeans in Module SSU23"
##                               ASV rel_ASVmeans           Phylum          Class
## 1 ASV5775:Candidatus Planktophila        23.02 Actinobacteriota Actinobacteria
## 2   ASV6222:Candidatus Aquirestis        19.68     Bacteroidota    Bacteroidia
##             Order          Family                   Genus Species
## 1      Frankiales Sporichthyaceae Candidatus Planktophila    <NA>
## 2 Chitinophagales  Saprospiraceae   Candidatus Aquirestis    <NA>
## [1] "rel ASVmeans in Module SSU15"
##                      ASV rel_ASVmeans          Phylum            Class
## 1 ASV8906:Flavobacterium        10.14    Bacteroidota      Bacteroidia
## 2   ASV5933:Gemmatimonas         7.70 Gemmatimonadota Gemmatimonadetes
##              Order            Family          Genus Species
## 1 Flavobacteriales Flavobacteriaceae Flavobacterium    <NA>
## 2 Gemmatimonadales Gemmatimonadaceae   Gemmatimonas    <NA>
## [1] "rel ASVmeans in Module SSU21"
##                    ASV rel_ASVmeans         Phylum               Class
## 1    ASV6010:Cytophaga        35.38   Bacteroidota         Bacteroidia
## 2 ASV8864:Beijerinckia        16.28 Proteobacteria Alphaproteobacteria
##          Order           Family        Genus Species
## 1 Cytophagales    Cytophagaceae    Cytophaga    <NA>
## 2  Rhizobiales Beijerinckiaceae Beijerinckia    <NA>
## [1] "rel ASVmeans in Module SSU2"
##                           ASV rel_ASVmeans         Phylum               Class
## 1 ASV883:Limnohabitans.curvus         3.84 Proteobacteria Gammaproteobacteria
## 2      ASV32:Polynucleobacter         3.44 Proteobacteria Gammaproteobacteria
##             Order           Family            Genus Species
## 1 Burkholderiales   Comamonadaceae    Limnohabitans  curvus
## 2 Burkholderiales Burkholderiaceae Polynucleobacter    <NA>
## [1] "rel ASVmeans in Module SSU10"
##                                 ASV rel_ASVmeans         Phylum
## 1 ASV751:Flavobacterium.succinicans        11.04   Bacteroidota
## 2                ASV312:Tabrizicola         6.57 Proteobacteria
##                 Class            Order            Family          Genus
## 1         Bacteroidia Flavobacteriales Flavobacteriaceae Flavobacterium
## 2 Alphaproteobacteria  Rhodobacterales  Rhodobacteraceae    Tabrizicola
##       Species
## 1 succinicans
## 2        <NA>
## [1] "rel ASVmeans in Module SSU5"
##                         ASV rel_ASVmeans         Phylum               Class
## 1      ASV565:Limnohabitans        12.48 Proteobacteria Gammaproteobacteria
## 2 ASV1161:Methylomonadaceae         4.98 Proteobacteria Gammaproteobacteria
##             Order            Family         Genus Species
## 1 Burkholderiales    Comamonadaceae Limnohabitans    <NA>
## 2 Methylococcales Methylomonadaceae          <NA>    <NA>
## [1] "rel ASVmeans in Module SSU12"
##                                  ASV rel_ASVmeans         Phylum
## 1 ASV183:Polynucleobacter.difficilis        16.00 Proteobacteria
## 2              ASV745:Pseudarcicella         8.93   Bacteroidota
##                 Class           Order           Family            Genus
## 1 Gammaproteobacteria Burkholderiales Burkholderiaceae Polynucleobacter
## 2         Bacteroidia    Cytophagales    Spirosomaceae   Pseudarcicella
##      Species
## 1 difficilis
## 2       <NA>
## [1] "rel ASVmeans in Module SSU18"
##                     ASV rel_ASVmeans         Phylum               Class
## 1 ASV1189:Sulfurifustis        26.90 Proteobacteria Gammaproteobacteria
## 2        ASV562:Woeseia        13.53 Proteobacteria Gammaproteobacteria
##                  Order                Family         Genus Species
## 1 Acidiferrobacterales Acidiferrobacteraceae Sulfurifustis    <NA>
## 2   Steroidobacterales           Woeseiaceae       Woeseia    <NA>
## [1] "rel ASVmeans in Module SSU3"
##                    ASV rel_ASVmeans         Phylum               Class
## 1    ASV121:Nitrospira         9.26   Nitrospirota         Nitrospiria
## 2 ASV309:Limnohabitans         6.61 Proteobacteria Gammaproteobacteria
##             Order         Family         Genus Species
## 1   Nitrospirales Nitrospiraceae    Nitrospira    <NA>
## 2 Burkholderiales Comamonadaceae Limnohabitans    <NA>
## [1] "rel ASVmeans in Module SSU13"
##                           ASV rel_ASVmeans            Phylum
## 1 ASV13:Acinetobacter.lwoffii        24.03    Proteobacteria
## 2      ASV1676:Roseimicrobium         4.27 Verrucomicrobiota
##                 Class              Order              Family          Genus
## 1 Gammaproteobacteria    Pseudomonadales       Moraxellaceae  Acinetobacter
## 2    Verrucomicrobiae Verrucomicrobiales Verrucomicrobiaceae Roseimicrobium
##   Species
## 1 lwoffii
## 2    <NA>
## [1] "rel ASVmeans in Module SSU14"
##                                  ASV rel_ASVmeans         Phylum
## 1 ASV3310:Rhizobiales Incertae Sedis         9.23 Proteobacteria
## 2            ASV9551:Pseudohongiella         7.14 Proteobacteria
##                 Class           Order                     Family
## 1 Alphaproteobacteria     Rhizobiales Rhizobiales Incertae Sedis
## 2 Gammaproteobacteria Pseudomonadales        Pseudohongiellaceae
##             Genus Species
## 1            <NA>    <NA>
## 2 Pseudohongiella    <NA>
## [1] "rel ASVmeans in Module SSU19"
##                           ASV rel_ASVmeans            Phylum
## 1 ASV3352:Steroidobacteraceae        17.90    Proteobacteria
## 2     ASV2884:Pedosphaeraceae         9.89 Verrucomicrobiota
##                 Class              Order              Family Genus Species
## 1 Gammaproteobacteria Steroidobacterales Steroidobacteraceae  <NA>    <NA>
## 2    Verrucomicrobiae     Pedosphaerales     Pedosphaeraceae  <NA>    <NA>
## [1] "rel ASVmeans in Module SSU7"
##                            ASV rel_ASVmeans           Phylum          Class
## 1 ASV804:Candidatus Aquirestis         8.24     Bacteroidota    Bacteroidia
## 2         ASV114:Mycobacterium         7.32 Actinobacteriota Actinobacteria
##               Order           Family                 Genus Species
## 1   Chitinophagales   Saprospiraceae Candidatus Aquirestis    <NA>
## 2 Corynebacteriales Mycobacteriaceae         Mycobacterium    <NA>
## [1] "rel ASVmeans in Module SSU22"
##                      ASV rel_ASVmeans           Phylum          Class
## 1      ASV813:hgcI clade        63.56 Actinobacteriota Actinobacteria
## 2 ASV904:Anaerolineaceae         9.60      Chloroflexi   Anaerolineae
##            Order          Family      Genus Species
## 1     Frankiales Sporichthyaceae hgcI clade    <NA>
## 2 Anaerolineales Anaerolineaceae       <NA>    <NA>
## [1] "rel ASVmeans in Module SSU11"
##                       ASV rel_ASVmeans           Phylum          Class
## 1 ASV4225:Sporichthyaceae        10.82 Actinobacteriota Actinobacteria
## 2  ASV7295:Pseudarcicella         4.85     Bacteroidota    Bacteroidia
##          Order          Family          Genus Species
## 1   Frankiales Sporichthyaceae           <NA>    <NA>
## 2 Cytophagales   Spirosomaceae Pseudarcicella    <NA>
## [1] "rel ASVmeans in Module SSU8"
##                       ASV rel_ASVmeans         Phylum               Class
## 1          ASV824:Woeseia         9.10 Proteobacteria Gammaproteobacteria
## 2 ASV186:OM60(NOR5) clade         8.55 Proteobacteria Gammaproteobacteria
##                Order      Family            Genus Species
## 1 Steroidobacterales Woeseiaceae          Woeseia    <NA>
## 2    Pseudomonadales  Halieaceae OM60(NOR5) clade    <NA>
## [1] "rel ASVmeans in Module SSU20"
##                    ASV rel_ASVmeans       Phylum       Class            Order
## 1    ASV651:Actibacter        14.94 Bacteroidota Bacteroidia Flavobacteriales
## 2 ASV718:Robiginitalea        14.25 Bacteroidota Bacteroidia Flavobacteriales
##              Family         Genus Species
## 1 Flavobacteriaceae    Actibacter    <NA>
## 2 Flavobacteriaceae Robiginitalea    <NA>
## [1] "rel ASVmeans in Module SSU6"
##                    ASV rel_ASVmeans            Phylum            Class
## 1 ASV43:Persicirhabdus        13.93 Verrucomicrobiota Verrucomicrobiae
## 2 ASV466:Ilumatobacter         5.11  Actinobacteriota   Acidimicrobiia
##                Order             Family          Genus Species
## 1 Verrucomicrobiales     Rubritaleaceae Persicirhabdus    <NA>
## 2     Microtrichales Ilumatobacteraceae  Ilumatobacter    <NA>
## [1] "rel ASVmeans in Module SSU1"
##                           ASV rel_ASVmeans       Phylum       Class
## 1 ASV313:NS11-12 marine group         7.37 Bacteroidota Bacteroidia
## 2       ASV380:Cryomorphaceae         6.24 Bacteroidota Bacteroidia
##                Order               Family Genus Species
## 1 Sphingobacteriales NS11-12 marine group  <NA>    <NA>
## 2   Flavobacteriales       Cryomorphaceae  <NA>    <NA>
## [1] "rel ASVmeans in Module SSU24"
##                             ASV rel_ASVmeans       Phylum       Class
## 1            ASV7848:Ulvibacter        49.77 Bacteroidota Bacteroidia
## 2 ASV12123:NS11-12 marine group        33.00 Bacteroidota Bacteroidia
##                Order               Family      Genus Species
## 1   Flavobacteriales    Flavobacteriaceae Ulvibacter    <NA>
## 2 Sphingobacteriales NS11-12 marine group       <NA>    <NA>
## [1] "rel ASVmeans in Module SSU0"
##                    ASV rel_ASVmeans           Phylum               Class
## 1   ASV2334:hgcI clade         3.64 Actinobacteriota      Actinobacteria
## 2 ASV423:Methylotenera         3.47   Proteobacteria Gammaproteobacteria
##             Order           Family         Genus Species
## 1      Frankiales  Sporichthyaceae    hgcI clade    <NA>
## 2 Burkholderiales Methylophilaceae Methylotenera    <NA>
####
#OE#
####
  cowplot::plot_grid(Module_Top5_Barplot_list$OE$SSU8, Module_Top5_Barplot_list$OE$SSU13,
                   Module_Top5_Barplot_list$OE$SSU6, Module_Top5_Barplot_list$OE$SSU12,
                   Module_Top5_Barplot_list$OE$SSU3,  Module_Top5_Barplot_list$OE$SSU10,
                   Module_Top5_Barplot_list$OE$SSU2, nrow=1, 
  labels = c("OE8", "OE13", "OE6", "OE12", "OE3", "OE10", "OE2")) -> part_1

  cowplot::plot_grid( Module_Top5_Barplot_list$OE$SSU11, Module_Top5_Barplot_list$OE$SSU5,
                    Module_Top5_Barplot_list$OE$SSU7,Module_Top5_Barplot_list$OE$SSU4,
                    Module_Top5_Barplot_list$OE$SSU1, Module_Top5_Barplot_list$OE$SSU9,
                    Module_Top5_Barplot_list$OE$SSU0, nrow=1, 
                    labels = c( "OE11", "OE5", "OE7", "OE4", "OE1", "OE9", "OE0")) -> part_2
  cowplot::plot_grid(part_1, part_2, nrow= 2) -> part_3
  ggsave(part_3, filename = paste(paste("Module_topTaxa_plot", 
          sep="_"),".png", sep=""), path = pathPlots , device='png', dpi=300, width = 16.5,height = 6)
  
  plot(part_3)

####
#GC#
####
cowplot::plot_grid(Module_Top5_Barplot_list$GC$SSU3, Module_Top5_Barplot_list$GC$SSU5,
                   Module_Top5_Barplot_list$GC$SSU9, Module_Top5_Barplot_list$GC$SSU6,
                   Module_Top5_Barplot_list$GC$SSU10,  Module_Top5_Barplot_list$GC$SSU7,
                   Module_Top5_Barplot_list$GC$SSU8, Module_Top5_Barplot_list$GC$SSU16, 
                   Module_Top5_Barplot_list$GC$SSU4, 
                   nrow=1, 
labels = c("GC3", "GC5", "GC9", "GC6", "GC10", "GC7", "GC8", "GC16", "GC4")) -> part_1

cowplot::plot_grid( Module_Top5_Barplot_list$GC$SSU14, Module_Top5_Barplot_list$GC$SSU13,
                    Module_Top5_Barplot_list$GC$SSU2,Module_Top5_Barplot_list$GC$SSU15,
                    Module_Top5_Barplot_list$GC$SSU11, Module_Top5_Barplot_list$GC$SSU1,
                    Module_Top5_Barplot_list$GC$SSU12, Module_Top5_Barplot_list$GC$SSU0,
                    nrow=1, 
                    labels = c( "GC14", "GC13", "GC2", "GC15", "GC11", "GC1", "GC12", "GC0")) -> part_2
cowplot::plot_grid(part_1, part_2, nrow= 2) -> part_3
ggsave(part_3, filename = paste(paste("Module_topTaxa_plot_GC", 
          sep="_"),".png", sep=""), path = pathPlots , device='png', dpi=300, width = 18,height = 6)
  plot(part_3)

#### 
#WF#
####

cowplot::plot_grid(Module_Top5_Barplot_list$WF$SSU9, Module_Top5_Barplot_list$WF$SSU17,
                   Module_Top5_Barplot_list$WF$SSU4, Module_Top5_Barplot_list$WF$SSU16,
                   Module_Top5_Barplot_list$WF$SSU23,  Module_Top5_Barplot_list$WF$SSU15,
                   Module_Top5_Barplot_list$WF$SSU21, Module_Top5_Barplot_list$WF$SSU2, 
                   Module_Top5_Barplot_list$WF$SSU10, 
                   nrow=1, 
labels = c("WF9", "WF17", "WF4", "WF16", "WF23", "WF15", "WF21", "WF2", "WF10")) -> part_1

cowplot::plot_grid(Module_Top5_Barplot_list$WF$SSU5, Module_Top5_Barplot_list$WF$SSU12,
                   Module_Top5_Barplot_list$WF$SSU18,Module_Top5_Barplot_list$WF$SSU3,
                   Module_Top5_Barplot_list$WF$SSU13, Module_Top5_Barplot_list$WF$SSU14,
                   Module_Top5_Barplot_list$WF$SSU19,Module_Top5_Barplot_list$WF$SSU7,
                    nrow=1, 
                   labels = c("WF5", "WF12", "WF18", "WF3", "WF13", "WF14", "WF19", "WF7")) -> part_2

cowplot::plot_grid( 
                    Module_Top5_Barplot_list$WF$SSU22, Module_Top5_Barplot_list$WF$SSU11,
                    Module_Top5_Barplot_list$WF$SSU8, Module_Top5_Barplot_list$WF$SSU20,
                    Module_Top5_Barplot_list$WF$SSU6, Module_Top5_Barplot_list$WF$SSU1,
                    Module_Top5_Barplot_list$WF$SSU24, Module_Top5_Barplot_list$WF$SSU0,
                    nrow=1, 
                    labels = c(  "WF22", "WF11", "WF8", "WF20", "WF6", "WF1", "WF24", "WF0")) -> part_3
cowplot::plot_grid(part_1, part_2, part_3, nrow= 3) -> part_4

ggsave(part_4, filename = paste(paste("Module_topTaxa_plot_WF", 
          sep="_"),".png", sep=""), path = pathPlots , device='png', dpi=300, width = 16,height = 9)
 plot(part_4)

 cowplot::plot_grid(Module_Top5_Barplot_list$WF$SSU9, Module_Top5_Barplot_list$WF$SSU17,
                   Module_Top5_Barplot_list$WF$SSU4, Module_Top5_Barplot_list$WF$SSU16,
                   Module_Top5_Barplot_list$WF$SSU23,  Module_Top5_Barplot_list$WF$SSU15,
                   Module_Top5_Barplot_list$WF$SSU21, 
                   nrow=1, 
labels = c("WF9", "WF17", "WF4", "WF16", "WF23", "WF15", "WF21")) -> part_1

cowplot::plot_grid(Module_Top5_Barplot_list$WF$SSU2, Module_Top5_Barplot_list$WF$SSU10, 
                   Module_Top5_Barplot_list$WF$SSU5, Module_Top5_Barplot_list$WF$SSU12,
                   Module_Top5_Barplot_list$WF$SSU18,Module_Top5_Barplot_list$WF$SSU3,
                    nrow=1, 
                   labels = c("WF2", "WF10", "WF5", "WF12", "WF18", "WF3")) -> part_2

cowplot::plot_grid( Module_Top5_Barplot_list$WF$SSU13, Module_Top5_Barplot_list$WF$SSU14,
                    Module_Top5_Barplot_list$WF$SSU19,Module_Top5_Barplot_list$WF$SSU7,
                    Module_Top5_Barplot_list$WF$SSU22, Module_Top5_Barplot_list$WF$SSU11,
                   
                    nrow=1, 
                    labels = c("WF13", "WF14", "WF19", "WF7", "WF22", "WF11", "WF8", "WF20", "WF6", "WF1", "WF24", "WF0")) -> part_3


cowplot::plot_grid( Module_Top5_Barplot_list$WF$SSU8, Module_Top5_Barplot_list$WF$SSU20,
                    Module_Top5_Barplot_list$WF$SSU6, Module_Top5_Barplot_list$WF$SSU1,
                    Module_Top5_Barplot_list$WF$SSU24, Module_Top5_Barplot_list$WF$SSU0,
                    nrow=1, 
                    labels = c( "WF8", "WF20", "WF6", "WF1", "WF24", "WF0")) -> part_4
 cowplot::plot_grid(part_1, part_2, part_3, part_4, nrow= 4) -> part_5
ggsave(part_5, filename = paste(paste("Module_topTaxa_plot_WF", 
          sep="_"),".png", sep=""), path = pathPlots , device='png', dpi=300, width = 12,height = 11)

9.3.4 Scatter with Kin

Lets check our understanding of these measurements: 1) kTotal - connectivity of the each gene based on its r-values to all other genes in the whole network 2) kWithin - connectivity of the each gene within a single module based on its r-values to all other genes within the same module 3) and 4) kOut and kDiff mathematical derivatives from 1) and 2) So, let say WGCNA identified 10 modules, but kWithin for “Module 2” is the largest and obviously larger than kTotal. This suggest “Module 2” to be a core of the network, or more important. Let say You perform functional annotation of the genes in this module and will find them to be enriched in amino acid analysis. And Your study was dedicated to metabolic changes under some conditions. So You can make conclusion that Your condition strongly affects AA metabolism, and via it - whole network. In contrast, high kOut can suggest that total connectivity is much larger that connectivity within modules, and I would say - reflects sort of network’s stability under tested conditions. So a set of vertices with high kOut can be targets for annotation as hubs that determined this stability. And elimination of them (knockout mutations, for example) can break the whole network.

################################################
#Scatter GS vs ModuleMembership over all Traits#
################################################
Kin_data <- list()
for (Datasetname  in names(WGCNA_list)[grepl("OE|GC", names(WGCNA_list))]) {

  require(WGCNA)
  Kin_data[[Datasetname]] <- list()
  omics_data    <- WGCNA_list[[Datasetname]]$omics_data
  datTraits     <- WGCNA_list[[Datasetname]]$datTraits
  moduleLabels  <- WGCNA_list[[Datasetname]]$moduleLabels
  moduleColors  <- WGCNA_list[[Datasetname]]$moduleColors
  MEs           <- WGCNA_list[[Datasetname]]$MEs
  network       <- WGCNA_list[[Datasetname]]$network
  ps            <- WGCNA_list[[Datasetname]]$ps
  softPower     <- WGCNA_list[[Datasetname]]$softPower
  SSUWGCNAlist  <- WGCNA_list[[Datasetname]]$SSUWGCNAlist
  
  TaxAnno       <- data.frame(tax_table(WGCNA_list$WF$ps)) %>% rownames_to_column(var="ASV")
  ModsOfInterst_list <- WGCNA_list[[Datasetname]]$ModsOfInterst_list[[paste("ModsOfInterest", 
                                                                            Datasetname, sep="_")]]
  

  module_size<- as.data.frame(table(moduleLabels))
  colnames(module_size) <- c("Module", "Size")
  module_size %>% dplyr::mutate(Module_color = labels2colors(as.numeric(as.character(Module)))) -> 
  module_size

  degrees <- intramodularConnectivity.fromExpr(as.data.frame(omics_data), 
                                             colors = moduleColors, power = softPower,
                                             networkType = "signed", distFnc = "bicor")
  degrees<- cbind(moduleLabels, degrees )

        
  #############################################################################
  #Creating GeneMembership and GeneSignificance Dataframes with GeneAnnotation#
  #############################################################################

  for (TraitOfInterest in colnames(datTraits)) {
    
    if (length(rownames(ModsOfInterst_list[[TraitOfInterest]])) != 0) {
    
      Kin_data[[Datasetname]][[TraitOfInterest]] <- list()
    

      for (MODULE in rownames(ModsOfInterst_list[[TraitOfInterest]])) {
            
      print(MODULE)
          
      modNames = names(MEs) #substring(names(MEs), 2)
      column = match(MODULE, modNames);
      ASV <- names(moduleLabels[moduleLabels == sub("ME","", MODULE)])
      moduleColor <-  labels2colors(as.numeric(as.character( sub("ME","", MODULE))))
      nSamples = nrow(omics_data)
      Data <- as.data.frame(cbind(ASV, moduleColor))
      Data  <- Data %>%
         left_join( #Add relative ASVmeans
         data.frame(t(phyloseq::otu_table(ps %>%
          transform_sample_counts(function(x) {x/sum(x)*100})))) %>%
            dplyr::mutate(ASVmeans = rowMeans(.)) %>%
            dplyr::mutate(ASV = rownames(.)) %>%
            dplyr::select(ASV, ASVmeans)) %>%
        left_join(as.data.frame(tax_table(ps)) %>% rownames_to_column(var="ASV"))
    
    geneModuleMembership <- cor(omics_data, MEs, use = "p") %>%
        as.data.frame() %>% #Create ModuleMembership 
        dplyr::select(paste("ME", sub("ME","", MODULE), sep="")) %>%
        setNames(paste0("MM"))  #No Individual naming here for Cytoscape columns beeing equal
    
    MMPvalue <- #Creating p.values for Module memberships
        as.data.frame(corPvalueStudent(as.matrix(geneModuleMembership), length(rownames(MEs)))) %>%
        dplyr::select(paste("MM", sep="")) %>%
        setNames(paste0("p.MM")) 
    
    geneTraitSignificance = as.data.frame(WGCNA::cor(omics_data,  datTraits[TraitOfInterest], use = "p"));
        GSPvalue = as.data.frame(WGCNA::corPvalueStudent(as.matrix(geneTraitSignificance), nSamples));
        names(geneTraitSignificance) = "GS"
        names(GSPvalue) = "p.GS"
 
    Data <- Data %>%
       left_join(geneModuleMembership  %>% rownames_to_column(var="ASV")) %>%
       left_join(MMPvalue  %>% rownames_to_column(var="ASV")) %>%
       left_join(geneTraitSignificance  %>% rownames_to_column(var="ASV")) %>%
       left_join(GSPvalue  %>% rownames_to_column(var="ASV"))  %>%
       left_join(degrees%>% rownames_to_column(var = "ASV"))
       # left_join(SeasonSums) %>%
       # left_join(RepSums)
    #dplyr::arrange(desc(ASVmeans))
    rownames(Data) <- Data$ASV
    
    Data$ASVname <- gsub("^ASV\\d+:", "", Data$ASV)
    Data$Trait <- TraitOfInterest
    
    #######################
    #Filter for Parameters#
    #######################
    #Data_filt <- Data %>% 
    #        filter(abs(MM) > 0.5 & p.MM < 0.05 & ASVmeans > 0.005) 
    Kin_data[[Datasetname]][[TraitOfInterest]][[MODULE]] <- Data
      }
  
        ######
        #Plot#
        ######
        if (!is.null(Kin_data[[Datasetname]][[TraitOfInterest]])) {
          BB <- do.call(rbind, Kin_data[[Datasetname]][[TraitOfInterest]])
        }
        BB$Modules <- sub("\\..*","",rownames(BB))
        rownames(BB) <-NULL
        
        #if (any(BB$moduleColor != "grey")) {
        #  BBB <- BB #[BB$moduleColor != "grey", ]
        #  } else {
          BBB <- BB
        #  }
        
        BBB$Modules <- sub(paste("_", TraitOfInterest, sep=""), "", BBB$Modules)
        BBB$Modules <- sub("ME", "", BBB$Modules)

        names(BBB) <- gsub(TraitOfInterest, "", names(BBB))
        BBB$TraitOfInterest <- TraitOfInterest
        
      require(cowplot)
      tryCatch({
      FILENAME    <- paste(paste("MultiModuleScatter-kwithin", Datasetname, TraitOfInterest, sep="_"),
                          ".png", sep="")

      plot <- ggplot(data = BBB, aes(x = .data$MM, y = .data$GS)) +
        geom_point(aes(colour = .data$moduleColor, size = ASVmeans), alpha = 0.5) +
        scale_size_continuous(range = c(0.4, 4)) +  # Adjust the size range
        xlab(paste("Module Membership", sep = " ")) +
        ylab(paste("ASV significance for", TraitOfInterest, sep = " ")) +
        labs(title = "",#paste(Tissue, Type, "Module Membership vs. ASV Significance")
        color = "sig. Modules") +
        scale_color_identity(guide = 'legend',
                       breaks = unique(BBB$moduleColor),
                       labels = paste(Datasetname, unique(BBB$moduleLabels))) +
        geom_hline(yintercept = 0, color = "red", linetype = "dashed")
      
        label <-  BBB %>%
                dplyr::arrange(desc(ASVmeans)) %>%
                dplyr::slice(c(1:5, (n() - 4):n()))
        
        label2 <- BBB %>%
                dplyr::arrange(desc(GS)) %>%
                dplyr::slice(c(1:5, (n() - 4):n()))
        label <- rbind(label, label2)
        label <- label[!duplicated(label$ASV),]

        plot <- plot + 
            ggrepel::geom_text_repel(
            data = label, size = 4,  aes(label = ASVname), color = OutlineColor) 
        
        # plot <- plot + ggrepel::geom_text_repel(
        #   data = BBB, size = 2.5, aes(label = ASVname), color = OutlineColor)
    
        # plot <- plot + theme_minimal() + atheme + theme(axis.title.x = element_blank()) +
        # theme(
        #   panel.grid.major = element_line(colour = "grey50"),
        #   panel.grid.minor = element_line(colour = "grey50"),
        #   legend.position = "right")
      
       plot <- plot + theme_minimal() + 
          #atheme+
          #theme(legend.position = "none") +
          theme(axis.title.x = element_blank()) +
          theme(
          #panel.grid.major = element_line(colour = "grey50"), 
          #panel.grid.minor = element_line(colour = "grey50"),
          legend.position = "right", 
          legend.title = element_text( size = 12, face = "bold"),
          legend.text = element_text(size = 12,face = "bold"), 
          axis.text.x.bottom = element_text(size = 15, face = "bold", angle = 45, hjust = 1),
          axis.text.y.left = element_text(size = 15, face = "bold"),
          axis.title.x = element_text(size = 15, face = "bold"),  # Axis title for x
          axis.title.y = element_text(size = 15, face = "bold"))

        ggsave(plot, filename = FILENAME, path = pathPlots, device='png', dpi=300, width = 6.5, height = 5)
    }, error=function(e){cat("ERROR :",conditionMessage(e), "\n")})
        
    }
  }
  WGCNA_list[[Datasetname]][["Kin_data"]] <- Kin_data
  }
##  softConnectivity: FYI: connecitivty of genes with less than 46 valid samples will be returned as NA.
##  ..calculating connectivities.. 
## [1] "ME6"
## [1] "ME11"
## [1] "ME6"
## [1] "ME12"
## [1] "ME11"
## [1] "ME9"
## [1] "ME6"
## [1] "ME11"
## [1] "ME5"
## [1] "ME1"
## [1] "ME13"
## [1] "ME8"
## [1] "ME13"
## [1] "ME6"
## [1] "ME7"
## [1] "ME1"
## [1] "ME3"
## [1] "ME7"
## [1] "ME3"
## [1] "ME10"
## [1] "ME2"
## [1] "ME5"
## [1] "ME1"
## [1] "ME9"
## [1] "ME13"
## [1] "ME5"
## [1] "ME7"
## [1] "ME1"
## [1] "ME9"
## [1] "ME13"
## [1] "ME6"
## [1] "ME7"
## [1] "ME3"
## [1] "ME10"
## [1] "ME2"
## [1] "ME5"
## [1] "ME7"
## [1] "ME1"
## [1] "ME13"
## [1] "ME6"
## [1] "ME5"
## [1] "ME7"
## [1] "ME1"
## [1] "ME9"
## [1] "ME6"
## [1] "ME11"
## [1] "ME3"
## [1] "ME10"
## [1] "ME2"
## [1] "ME4"
## [1] "ME0"
##  softConnectivity: FYI: connecitivty of genes with less than 30 valid samples will be returned as NA.
##  ..calculating connectivities.. 
## [1] "ME3"
## [1] "ME7"
## [1] "ME6"
## [1] "ME16"
## [1] "ME6"
## [1] "ME7"
## [1] "ME11"
## [1] "ME7"
## [1] "ME11"
## [1] "ME7"
## [1] "ME2"
## [1] "ME15"
## [1] "ME11"
## [1] "ME5"
## [1] "ME9"
## [1] "ME7"
## [1] "ME8"
## [1] "ME2"
## [1] "ME15"
## [1] "ME11"
## [1] "ME5"
## [1] "ME9"
## [1] "ME6"
## [1] "ME6"
## [1] "ME8"
## [1] "ME16"
## [1] "ME13"
## [1] "ME2"
## [1] "ME15"
## [1] "ME11"
## [1] "ME12"
## [1] "ME6"
## [1] "ME10"
## [1] "ME13"
## [1] "ME2"
## [1] "ME5"
## [1] "ME6"
## [1] "ME10"
## [1] "ME16"
## [1] "ME5"
## [1] "ME9"
## [1] "ME6"
## [1] "ME16"
## [1] "ME4"
## [1] "ME13"
## [1] "ME2"
## [1] "ME12"
## [1] "ME8"
## [1] "ME16"
## [1] "ME2"
## [1] "ME3"
## [1] "ME7"
## [1] "ME11"

9.3.4.2 Specific Correlation

Interest <- list("NO2" = 
                   list("OE" = "ME7", "GC" = "ME2"), 
                 "O2" = 
                   list("OE" = "ME7", "GC" = "ME2"), 
                  "Salinity" = 
                   list("OE" = "ME3", "GC" = "ME7") 
                 )
for (Datasetname in names(WGCNA_list)[grepl("OE|GC", names(WGCNA_list))]) {
  
  for (TraitOfInterest in names(Interest)) {

  GS_SSU <- WGCNA_list[[Datasetname]]$Kin_data[[Datasetname]][[TraitOfInterest]]
  MODULE <- Interest[[TraitOfInterest]][[Datasetname]]
  GS_SSU <- GS_SSU[[MODULE]]
  color_mapping<- pslist$Optics$colored_taxonomy_Fish

  #GS_SSU <- do.call(rbind, Kin_data$OE$NO2$ME7)  #%>% 
            #filter(abs(MM) > 0.5 & p.MM < 0.05 & ASVmeans > 0.005 & GS < -0.2) 

  COL <- na.omit(color_mapping[unique(names(color_mapping))])


  GS_SSU$Taxa <- GS_SSU$ASVname
  
  GS_SSU <- left_join(GS_SSU, COL,copy = TRUE, keep = NULL)

  GS_SSU$Color<- GS_SSU$Color %>% replace_na("grey50")

    color_mapping <- setNames(GS_SSU$Color, GS_SSU$ASVname)
    color_mapping[["Candidatus Megaira"]] <- "purple"
    color_mapping[["Acinetobacter.lwoffii"]] <- "#996600"
    color_mapping[["Acinetobacter.johnsonii"]] <- "#663300"
    color_mapping[["Shewanella"]] <- "#FF3366"
    color_mapping[["Shewanella.baltica"]] <- "#FF0099"
    color_mapping[["Shewanella.putrefaciens"]] <-"#FF0066"
    color_mapping[["Photobacterium"]] <- "#FFF000"
    color_mapping[["Aeromonas"]] <- "#FFCC00"

  BacteriaHostCorrelationPlot <- ggplot(data = GS_SSU, aes(x = MM, y = GS)) + 
          geom_point(aes(colour = ASVname, size = ASVmeans), alpha = 0.8) +
          scale_size_continuous(range = c(1, 8)) +  # Adjust the size range
          xlab(paste("Intramodular correlation", sep = " ")) + 
          ylab(paste("Correlation to",unique(GS_SSU$Trait) , sep = " ")) +
          labs(#title = paste("Bacterial abundance correlated to host Gill-", subset_df$RNA_Module[1],
              #               sep=""),
          color = "Module") +
          theme(legend.position = "none") +
          scale_color_manual(values = color_mapping) +
          #scale_color_identity(guide = 'legend', 
          #             breaks = unique(GS_SSU$moduleColor), 
          #             labels = paste(Datasetname, unique(GS_SSU$moduleLabels), sep="")) +
          geom_hline(yintercept = 0, color = "red", size = 1,linetype = "dashed")

      #BacteriaHostCorrelatonPlot <- BacteriaHostCorrelatonPlot + 
      #      ggrepel::geom_text_repel(
      #      data = GS_SSU, size = 4,  aes(label = ASVname), color = OutlineColor) #white
      
  
      label <-  GS_SSU %>%
                dplyr::arrange(desc(ASVmeans)) %>%
                head(10)

      BacteriaHostCorrelationPlot <- BacteriaHostCorrelationPlot + 
            ggrepel::geom_text_repel(
            data = label, size = 4,  aes(label = ASVname), color = OutlineColor) 
   
      

      BacteriaHostCorrelationPlot <- BacteriaHostCorrelationPlot + theme_minimal() + 
          #atheme+
          theme(legend.position = "none") +
          theme(axis.title.x = element_blank()) +
          theme(
          #panel.grid.major = element_line(colour = "grey50"), 
          #panel.grid.minor = element_line(colour = "grey50"),
          #legend.position = "right", 
          legend.title = element_text( size = 12, face = "bold"),
          legend.text = element_text(size = 12,face = "bold"), 
          axis.text.x.bottom = element_text(size = 15, face = "bold", angle = 45, hjust = 1),
          axis.text.y.left = element_text(size = 15, face = "bold"),
          axis.title.x = element_text(size = 15, face = "bold"),  # Axis title for x
          axis.title.y = element_text(size = 15, face = "bold"))
          
      ggsave(BacteriaHostCorrelationPlot, filename = paste(paste(save_name, "GS_SSU", Datasetname,
            unique(GS_SSU$moduleLabels), unique(GS_SSU$Trait), sep="_"), ".png", sep="") , 
            path = pathPlots, device='png', dpi=300, width = 4, height = 4)
      
      plot(BacteriaHostCorrelationPlot)
  }
}
## Joining with `by = join_by(Taxa)`
## Joining with `by = join_by(Taxa)`

## Joining with `by = join_by(Taxa)`

## Joining with `by = join_by(Taxa)`

## Joining with `by = join_by(Taxa)`

## Joining with `by = join_by(Taxa)`

9.3.5 WGCNA & Core stats

#Overall abundance per Module
ASVsums <- as.data.frame(sapply(WGCNA_list$OE$SSUWGCNAlist, function(x) sum(x$ASVmeans))) 
ASVsums<- ASVsums %>% 
  setNames("ASVsum") %>%
  rownames_to_column(var = "Module") %>%
  dplyr::arrange(desc(ASVsum))
ASVsums
##    Module      ASVsum
## 1    SSU6 38.16113612
## 2    SSU0 28.13226744
## 3    SSU5  6.52820305
## 4    SSU1  6.45842648
## 5    SSU9  5.11210527
## 6    SSU7  4.05596022
## 7    SSU4  3.73560975
## 8    SSU3  2.59369432
## 9    SSU2  1.99255136
## 10   SSU8  1.00034891
## 11  SSU10  0.86454418
## 12  SSU12  0.77910329
## 13  SSU11  0.49865859
## 14  SSU13  0.08739103
ASVsums <- as.data.frame(sapply(WGCNA_list$GC$SSUWGCNAlist, function(x) sum(x$ASVmeans))) 
ASVsums<- ASVsums %>% 
  setNames("ASVsum") %>%
  rownames_to_column(var = "Module") %>%
  dplyr::arrange(desc(ASVsum))
ASVsums
##    Module      ASVsum
## 1    SSU8 44.76220787
## 2    SSU0 19.00825798
## 3    SSU5  8.04758341
## 4   SSU11  6.79125083
## 5    SSU2  4.37238718
## 6    SSU4  4.10244216
## 7    SSU1  3.22182500
## 8    SSU3  3.06561530
## 9   SSU10  1.58117727
## 10  SSU14  1.21377615
## 11   SSU6  0.86314919
## 12   SSU7  0.84463946
## 13  SSU12  0.77722663
## 14   SSU9  0.71343565
## 15  SSU16  0.42434350
## 16  SSU13  0.13114311
## 17  SSU15  0.07953932
ASVsums <- as.data.frame(sapply(WGCNA_list$WF$SSUWGCNAlist, function(x) sum(x$ASVmeans))) 
ASVsums<- ASVsums %>% 
  setNames("ASVsum") %>%
  rownames_to_column(var = "Module") %>%
  dplyr::arrange(desc(ASVsum))
ASVsums
##    Module      ASVsum
## 1    SSU4 18.67512697
## 2    SSU1 17.95200467
## 3    SSU3 12.69217555
## 4    SSU2  8.42571784
## 5    SSU6  8.08816629
## 6   SSU12  5.33509446
## 7    SSU0  5.15722962
## 8   SSU10  4.66452607
## 9    SSU7  3.98598817
## 10   SSU5  3.22604492
## 11   SSU9  2.83619097
## 12  SSU16  2.50752627
## 13   SSU8  2.05215664
## 14  SSU11  1.21562483
## 15  SSU22  0.77579042
## 16  SSU18  0.50431036
## 17  SSU13  0.35413346
## 18  SSU15  0.27380863
## 19  SSU14  0.26776191
## 20  SSU20  0.21968099
## 21  SSU17  0.21803867
## 22  SSU19  0.21790857
## 23  SSU23  0.19137072
## 24  SSU21  0.12892261
## 25  SSU24  0.03470041
####################################################
#Prevalence core vs network core & Bacterioplankton#
####################################################
for (Datasetname in names(WGCNA_list)) {
  #Datasetname <- sub("ps_","", Dataset)
  
  ps <- pslist[[paste("ps", Datasetname, sep="_")]]
  
  if (Datasetname == "OE") {
    Core_ME <- "6"
  } else if (Datasetname == "GC") {
    Core_ME <- "8"
  }
    
  SampleSums <- ps %>%
      transform_sample_counts(function(x) {x/sum(x)*100}) %>%
      phyloseq::otu_table() %>%
      as.data.frame() %>%
      rownames_to_column(var = "SampleID") %>%
      dplyr::left_join(SAMDF16S, by = c("SampleID" = "SampleID")) %>%
      dplyr::group_by(SampleID) %>%
      dplyr::summarise(dplyr::across(rownames(tax_table(ps)), mean, na.rm = TRUE)) %>% 
      t() %>%
      as.data.frame() %>%
      `colnames<-`(.[1, ]) %>%
      .[-1, ] %>%
      stats::setNames(paste0("avg_", colnames(.))) %>%
      mutate_all(as.numeric) %>%
      round(., digits = 2) %>%
      rownames_to_column(var="ASV")
  
   SeasonSums <- ps %>%
      transform_sample_counts(function(x) {x/sum(x)*100}) %>%
      phyloseq::otu_table() %>%
      as.data.frame() %>%
      rownames_to_column(var = "SampleID") %>%
      dplyr::left_join(SAMDF16S, by = c("SampleID" = "SampleID")) %>%
      dplyr::group_by(Season) %>%
      dplyr::summarise(dplyr::across(rownames(tax_table(ps)), mean, na.rm = TRUE)) %>% 
      t() %>%
      as.data.frame() %>%
      `colnames<-`(.[1, ]) %>%
      .[-1, ] %>%
      stats::setNames(paste0("avg_", colnames(.))) %>%
      mutate_all(as.numeric) %>%
      round(., digits = 2) %>%
      rownames_to_column(var="ASV")
   
   RepsSums <- ps %>%
      transform_sample_counts(function(x) {x/sum(x)*100}) %>%
      phyloseq::otu_table() %>%
      as.data.frame() %>%
      rownames_to_column(var = "SampleID") %>%
      dplyr::left_join(SAMDF16S, by = c("SampleID" = "SampleID")) %>%
      dplyr::group_by(Replicates) %>%
      dplyr::summarise(dplyr::across(rownames(tax_table(ps)), mean, na.rm = TRUE)) %>% 
      t() %>%
      as.data.frame() %>%
      `colnames<-`(.[1, ]) %>%
      .[-1, ] %>%
      stats::setNames(paste0("avg_", colnames(.))) %>%
      mutate_all(as.numeric) %>%
      round(., digits = 2) %>%
      rownames_to_column(var="ASV")
   
  ################################################
  #How much does the Core microbiome account for: 
  Core  <- pslist[[paste("Core", Datasetname, sep="_")]] %>%
            left_join( #Add relative ASVmeans 
            data.frame(t(phyloseq::otu_table(ps %>%
            transform_sample_counts(function(x) {x/sum(x)*100})))) %>%
            dplyr::mutate(ASVmeans = rowMeans(.)) %>%
            dplyr::mutate(ASV = rownames(.)) %>% 
            dplyr::select(ASV, ASVmeans)) 
  
  Core_ME_dat <- WGCNA_list[[Datasetname]]$SSUWGCNAlist[[paste("SSU", Core_ME,
                                sep="")]]
  
   print("Seasonal")
   print(SeasonSums  %>%
      dplyr::filter(ASV %in% Core$ASV) %>%
      dplyr::select(-ASV) %>%
      colSums() %>%
      as.data.frame()
   )
   
    print("Replicates")
    print(
    RepsSums  %>%
      dplyr::filter(ASV %in% Core$ASV) %>%
      dplyr::select(-ASV) %>%
      colSums() %>%
      as.data.frame()
    )
    
  #############################
  #Core Part of the Microbiome#
  #############################
  print(Datasetname)
  print(paste("Core Taxa account for", round(sum(Core$ASVmeans), digits=2), "%", "of the", 
              Datasetname, "Microbiome"))
  print(paste("Core Taxa", length(Core$ASVmeans)))
  
  print("Sample summary")
  print(SampleSums %>% 
    dplyr::filter(ASV %in% Core$ASV) %>%
    dplyr::select(-ASV) %>%
    colSums() %>%
    summary())
  
  print(paste("sd", sd(SampleSums %>% 
    dplyr::filter(ASV %in% Core$ASV) %>%
    dplyr::select(-ASV) %>%
    colSums())))
  
    print("rel ASVmeans per Phylum in Core")
    print(
      Core %>%
        dplyr::mutate(total_sum_ASVmeans = sum(ASVmeans, na.rm = TRUE)) %>%
        dplyr::group_by(Phylum) %>%
        dplyr::mutate(sum_ASVmeans = sum(ASVmeans, na.rm = TRUE)) %>%
        dplyr::summarize(rel_ASVmeans = round(sum(sum_ASVmeans) / sum(total_sum_ASVmeans) * 100,
                                              digits=2))%>%
        dplyr::arrange(desc(rel_ASVmeans)) %>% 
        as.data.frame()
      )
    
    print("rel ASVmeans per Order in Core")
    print(
      Core %>%
        dplyr::mutate(total_sum_ASVmeans = sum(ASVmeans, na.rm = TRUE)) %>%
        dplyr::group_by(Order) %>%
        dplyr::mutate(sum_ASVmeans = sum(ASVmeans, na.rm = TRUE)) %>%
        dplyr::summarize(rel_ASVmeans = round(sum(sum_ASVmeans) / sum(total_sum_ASVmeans) * 100,
                                              digits=2))%>%
        dplyr::arrange(desc(rel_ASVmeans)) %>% 
        as.data.frame()
      )
  
  
    print("rel ASVmeans per Taxon in Core")
    print(
      Core %>%
        dplyr::mutate(Taxon = gsub("ASV\\d+:", "", ASV)) %>%
        dplyr::group_by(Taxon) %>%
        dplyr::summarize(Taxon_means = sum(ASVmeans))  %>%
        dplyr::arrange(desc(Taxon_means)) %>% 
        as.data.frame()
      )
    
  ######################
  #Module genes in Core#
  ######################
  print(paste("ME Taxa", length(Core_ME_dat[[paste("ME", Core_ME,
                                sep="")]])))
  length((Core$ASV %in% Core_ME_dat$ASV))

  print(paste(round(length(Core[Core$ASV %in% Core_ME_dat$ASV,]$ASV)/ length(Core$ASV) *100, digits=2),"% Core ASVs in Module", Datasetname, Core_ME))
  
  print(paste("Relative Abundance of", Datasetname, "Core Taxa in", Datasetname, Core_ME))
  print(sum(Core[Core$ASV %in% Core_ME_dat$ASV,]$ASVmeans)/sum(Core$ASVmeans)*100)
  
  print(paste("Relative Abundance of", Datasetname, Core_ME,  "in Core Taxa in", Datasetname))
  print(sum(Core_ME_dat[Core_ME_dat$ASV %in% Core$ASV,]$ASVmeans)/sum(Core_ME_dat$ASVmeans))

  print("-")
  
  
  #####################
  #Core in Waterfilter#
  #####################
    WF_ASVmeans <- pslist$ps_WF %>%
      transform_sample_counts(function(x) {x/sum(x)*100}) %>%
      phyloseq::otu_table() %>%
      t() %>%
      as.data.frame() %>%
            dplyr::mutate(ASVmeans = rowMeans(.)) %>%
            dplyr::mutate(ASV = rownames(.)) %>% 
            dplyr::select(ASV, ASVmeans)
  
  print(paste("Amount of Taxa shared in Core and Bacterioplankton"))
  print(sum(Core$ASV %in% rownames(t(otu_table(pslist$ps_WF)))/length(Core$ASV))*100)
  
  print(paste("Relative Abundance of", Datasetname, "Core Taxa in",  "in Bacterioplankton"))
  print(sum(WF_ASVmeans[WF_ASVmeans$ASV %in% Core$ASV,]$ASVmeans))

  print(paste("Relative Abundance of", Datasetname,  Core_ME,  "in Bacterioplankton"))
  print(sum(WF_ASVmeans[WF_ASVmeans$ASV %in% Core_ME_dat$ASV,]$ASVmeans))
  

  #In section Bacterioplankton vs Fish 
  
 #  #################################
 #  #Overlapping Taxa betweem biomes#
 #  #################################
 #  Overlapping  <- as.data.frame(Overlapping_Taxa) %>%
 #    `colnames<-`("ASV") %>%
 #            left_join( #Add relative ASVmeans 
 #            data.frame(t(phyloseq::otu_table(ps %>%
 #            transform_sample_counts(function(x) {x/sum(x)*100})))) %>%
 #            dplyr::mutate(ASVmeans = rowMeans(.)) %>%
 #            dplyr::mutate(ASV = rownames(.)) %>% 
 #            dplyr::select(ASV, ASVmeans)) 
 # print(paste("Relative Abundance of Overlapping taxa between biomes in", Datasetname))
 # print(sum(Overlapping$ASVmeans))
 #   
 #        print(paste("Relative Abundance of Overlapping taxa between biomes in Bacterioplankton"))
 #  print(sum(WF_ASVmeans[WF_ASVmeans$ASV %in%Overlapping_Taxa,]$ASVmeans))
 # 
 #    
 #  
 #   RepsSums[RepsSums$ASV %in% Overlapping$ASV,] %>% 
 #    dplyr::summarize(across(starts_with("avg_"), sum, na.rm = TRUE))%>% 
 #    as.data.frame()
  
  
}
## Joining with `by = join_by(ASV)`
## [1] "Seasonal"
##                   .
## avg_Autumn_21 14.30
## avg_Spring_22 37.67
## avg_Summer_21 34.53
## avg_Winter_22 33.58
## [1] "Replicates"
##                  .
## avg_OEAU21BB 13.33
## avg_OEAU21MG 14.66
## avg_OEAU21ML 11.81
## avg_OEAU21SS 17.14
## avg_OEAU21TW 14.51
## avg_OESP22BB 43.35
## avg_OESP22MG 32.47
## avg_OESP22ML 38.52
## avg_OESP22SS 40.59
## avg_OESP22TW 33.41
## avg_OESU21BB 41.79
## avg_OESU21MG 38.99
## avg_OESU21ML 18.38
## avg_OESU21SS 37.98
## avg_OESU21TW 30.84
## avg_OEWI22BB 34.03
## avg_OEWI22MG 35.68
## avg_OEWI22ML 30.64
## avg_OEWI22SS 36.71
## avg_OEWI22TW 30.76
## [1] "OE"
## [1] "Core Taxa account for 29.95 % of the OE Microbiome"
## [1] "Core Taxa 27"
## [1] "Sample summary"
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    3.40   19.23   32.55   29.95   40.83   52.36 
## [1] "sd 12.130501795143"
## [1] "rel ASVmeans per Phylum in Core"
##             Phylum rel_ASVmeans
## 1     Bacteroidota        60.57
## 2   Proteobacteria        27.03
## 3 Actinobacteriota         9.16
## 4     Deinococcota         3.23
## [1] "rel ASVmeans per Order in Core"
##                  Order rel_ASVmeans
## 1     Flavobacteriales        56.39
## 2     Enterobacterales        17.88
## 3        Micrococcales         7.61
## 4      Chitinophagales         4.18
## 5          Rhizobiales         3.76
## 6        Deinococcales         3.23
## 7      Pseudomonadales         2.96
## 8      Burkholderiales         1.31
## 9       Microtrichales         0.80
## 10    Sphingomonadales         0.60
## 11     Caulobacterales         0.52
## 12   Corynebacteriales         0.41
## 13 Propionibacteriales         0.34
## [1] "rel ASVmeans per Taxon in Core"
##                           Taxon Taxon_means
## 1               Elizabethkingia  15.6228863
## 2            Enterobacteriaceae   2.5116446
## 3                  Enterobacter   1.9832069
## 4                  Arthrobacter   1.7128911
## 5                Asinibacterium   1.2507845
## 6                   Deinococcus   0.9689178
## 7                    Lelliottia   0.8590406
## 8                 Psychrobacter   0.6132957
## 9                 Mesorhizobium   0.5088703
## 10                Weeksellaceae   0.4624175
## 11 Chryseobacterium.antarcticum   0.4033593
## 12             Chryseobacterium   0.4012136
## 13                     Knoellia   0.3612595
## 14               Bradyrhizobium   0.3294495
## 15                  Pseudomonas   0.2729780
## 16                Ilumatobacter   0.2392540
## 17                   Agrococcus   0.2064247
## 18                 Sphingomonas   0.1811390
## 19                       Afipia   0.1592598
## 20             Caulobacteraceae   0.1568132
## 21          Comamonas.koreensis   0.1519231
## 22            Xanthobacteraceae   0.1288640
## 23                      TRA3-20   0.1282404
## 24                Mycobacterium   0.1241212
## 25                       GOUTA6   0.1122962
## 26                Cutibacterium   0.1007334
## [1] "ME Taxa 72"
## [1] "18.52 % Core ASVs in Module OE 6"
## [1] "Relative Abundance of OE Core Taxa in OE 6"
## [1] 70.54356
## [1] "Relative Abundance of OE 6 in Core Taxa in OE"
## [1] 0.5536707
## [1] "-"
## [1] "Amount of Taxa shared in Core and Bacterioplankton"
## [1] 33.33333
## [1] "Relative Abundance of OE Core Taxa in in Bacterioplankton"
## [1] 1.028735
## [1] "Relative Abundance of OE 6 in Bacterioplankton"
## [1] 0.1076743
## Joining with `by = join_by(ASV)`
## [1] "Seasonal"
##                   .
## avg_Autumn_21 58.77
## avg_Spring_22 60.44
## avg_Summer_21 36.21
## avg_Winter_22 44.01
## [1] "Replicates"
##                  .
## avg_GCAU21BB 50.56
## avg_GCAU21ML 61.51
## avg_GCAU21SS 67.55
## avg_GCAU21TW 51.08
## avg_GCSP22BB 65.14
## avg_GCSP22ML 59.31
## avg_GCSP22SS 56.72
## avg_GCSP22TW 60.78
## avg_GCSU21BB 58.98
## avg_GCSU21ML 22.60
## avg_GCSU21SS 41.44
## avg_GCSU21TW 32.41
## avg_GCWI22BB 34.89
## avg_GCWI22ML 44.45
## avg_GCWI22SS 49.25
## [1] "GC"
## [1] "Core Taxa account for 50.24 % of the GC Microbiome"
## [1] "Core Taxa 64"
## [1] "Sample summary"
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    6.39   33.30   52.17   50.23   67.49   89.12 
## [1] "sd 21.824715724556"
## [1] "rel ASVmeans per Phylum in Core"
##             Phylum rel_ASVmeans
## 1   Proteobacteria        56.08
## 2     Bacteroidota        36.92
## 3 Actinobacteriota         5.16
## 4     Deinococcota         1.83
## [1] "rel ASVmeans per Order in Core"
##                 Order rel_ASVmeans
## 1    Enterobacterales        49.07
## 2    Flavobacteriales        36.00
## 3       Micrococcales         4.10
## 4     Burkholderiales         4.07
## 5     Pseudomonadales         2.11
## 6       Deinococcales         1.83
## 7         Rhizobiales         0.83
## 8     Chitinophagales         0.80
## 9   Corynebacteriales         0.55
## 10     Microtrichales         0.51
## 11 Sphingobacteriales         0.12
## [1] "rel ASVmeans per Taxon in Core"
##                           Taxon Taxon_means
## 1               Elizabethkingia 17.11836905
## 2            Enterobacteriaceae 12.11279350
## 3                   Citrobacter  5.34848554
## 4                  Enterobacter  4.11589997
## 5     Enterobacter.cancerogenus  1.30496659
## 6                  Microvirgula  1.20803246
## 7                    Lelliottia  1.12780959
## 8                   Deinococcus  0.91853579
## 9                  Arthrobacter  0.86307313
## 10                  Pseudomonas  0.61334902
## 11                Weeksellaceae  0.40283825
## 12               Asinibacterium  0.40029341
## 13              Ornithinicoccus  0.39801685
## 14                Psychrobacter  0.37895266
## 15                     Knoellia  0.37296312
## 16 Chryseobacterium.antarcticum  0.35067289
## 17                   Agrococcus  0.26251153
## 18                Ilumatobacter  0.25751606
## 19                     Yersinia  0.23108201
## 20             Chryseobacterium  0.21519016
## 21                      Delftia  0.19187890
## 22          Comamonas.koreensis  0.18401844
## 23                    Aeromonas  0.16625115
## 24           Intrasporangiaceae  0.16356181
## 25                Mycobacterium  0.15009177
## 26                   Klebsiella  0.14766568
## 27                Achromobacter  0.14752195
## 28               Bradyrhizobium  0.12931774
## 29                  Rhodococcus  0.12683854
## 30                       GOUTA6  0.12291035
## 31            Xanthobacteraceae  0.11671916
## 32               Hyphomicrobium  0.11276910
## 33                      TRA3-20  0.10544011
## 34                     Serratia  0.09696374
## 35                    Comamonas  0.08273035
## 36         Pseudomonas.batumici  0.06895924
## 37             Sphingobacterium  0.06154681
## 38            Pseudochrobactrum  0.05883710
## [1] "ME Taxa 61"
## [1] "68.75 % Core ASVs in Module GC 8"
## [1] "Relative Abundance of GC Core Taxa in GC 8"
## [1] 88.17333
## [1] "Relative Abundance of GC 8 in Core Taxa in GC"
## [1] 0.9895447
## [1] "-"
## [1] "Amount of Taxa shared in Core and Bacterioplankton"
## [1] 20.3125
## [1] "Relative Abundance of GC Core Taxa in in Bacterioplankton"
## [1] 1.198428
## [1] "Relative Abundance of GC 8 in Bacterioplankton"
## [1] 0.1146737
## Joining with `by = join_by(ASV)`
## [1] "Seasonal"
##                   .
## avg_Autumn_21 43.51
## avg_Spring_22 41.21
## avg_Summer_21 34.48
## avg_Winter_22 33.99
## [1] "Replicates"
##                  .
## avg_WFAU21TW 43.51
## avg_WFSP22BB 36.19
## avg_WFSP22MG 21.96
## avg_WFSP22ML 53.09
## avg_WFSP22SS 53.55
## avg_WFSU21BB 17.52
## avg_WFSU21MG 27.81
## avg_WFSU21ML 44.44
## avg_WFSU21SS 48.38
## avg_WFWI22BB 36.26
## avg_WFWI22MG 36.75
## avg_WFWI22ML 27.18
## avg_WFWI22SS 30.62
## [1] "WF"
## [1] "Core Taxa account for 37.64 % of the WF Microbiome"
## [1] "Core Taxa 170"
## [1] "Sample summary"
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   17.38   29.21   36.61   37.63   46.41   54.79 
## [1] "sd 11.5060520561194"
## [1] "rel ASVmeans per Phylum in Core"
##               Phylum rel_ASVmeans
## 1   Actinobacteriota        39.24
## 2     Proteobacteria        35.31
## 3       Bacteroidota         7.40
## 4  Verrucomicrobiota         5.70
## 5       Nitrospirota         4.64
## 6   Desulfobacterota         2.30
## 7    Acidobacteriota         1.40
## 8    Gemmatimonadota         1.18
## 9      Cyanobacteria         0.91
## 10       Chloroflexi         0.60
## 11   Planctomycetota         0.54
## 12       Myxococcota         0.45
## 13        Firmicutes         0.32
## [1] "rel ASVmeans per Order in Core"
##                                  Order rel_ASVmeans
## 1                           Frankiales        24.40
## 2                      Burkholderiales        20.21
## 3                       Microtrichales        11.14
## 4                   Verrucomicrobiales         5.70
## 5                        Nitrospirales         4.64
## 6                          Rhizobiales         4.55
## 7                      Pseudomonadales         4.12
## 8                      Chitinophagales         3.09
## 9                         Cytophagales         3.04
## 10                       Micrococcales         2.24
## 11                  Steroidobacterales         1.73
## 12                   Corynebacteriales         1.34
## 13                    Flavobacteriales         1.27
## 14                     Desulfobulbales         1.25
## 15                    Gemmatimonadales         1.18
## 16                     Rhodobacterales         1.16
## 17                    Blastocatellales         1.13
## 18                    Cyanobacteriales         0.91
## 19                     Methylococcales         0.85
## 20                    Sphingomonadales         0.80
## 21                  Desulfuromonadales         0.75
## 22                     Caulobacterales         0.62
## 23                      Anaerolineales         0.60
## 24                          Gemmatales         0.54
## 25                        Polyangiales         0.45
## 26  Gammaproteobacteria Incertae Sedis         0.40
## 27                Acidiferrobacterales         0.36
## 28                       Reyranellales         0.36
## 29 Peptostreptococcales-Tissierellales         0.32
## 30                  Vicinamibacterales         0.27
## 31                  Desulfatiglandales         0.18
## 32                     Nitrosococcales         0.14
## 33                   Desulfobacterales         0.13
## 34                          Gaiellales         0.13
## [1] "rel ASVmeans per Taxon in Core"
##                           Taxon Taxon_means
## 1                    hgcI clade  4.83727795
## 2         CL500-29 marine group  3.49193649
## 3               Sporichthyaceae  3.35574208
## 4                 Limnohabitans  1.83683223
## 5                Persicirhabdus  1.81414932
## 6                    Nitrospira  1.61253257
## 7       Candidatus Planktophila  0.99106296
## 8                       TRA3-20  0.93151256
## 9    Rhizobiales Incertae Sedis  0.86384163
## 10  Polynucleobacter.difficilis  0.85347953
## 11    Candidatus Methylopumilus  0.77997753
## 12         Candidatus Limnoluna  0.76513131
## 13               Saprospiraceae  0.71370411
## 14                Ilumatobacter  0.59288656
## 15                  Halioglobus  0.59154649
## 16                Methylotenera  0.54775641
## 17              Pseudohongiella  0.52498050
## 18                Mycobacterium  0.50457620
## 19                      Woeseia  0.48299609
## 20               Pseudarcicella  0.47663873
## 21             Polynucleobacter  0.47629849
## 22                        A0839  0.40863356
## 23                      Hoppeia  0.38824829
## 24     Planktothrix NIVA-CYA 15  0.34424637
## 25            Gemmatimonadaceae  0.33580787
## 26        Candidatus Aquirestis  0.32856382
## 27         Limnohabitans.curvus  0.32365086
## 28                Methylobacter  0.32141650
## 29   Sphingorhabdus.wooponensis  0.30297773
## 30   Limnohabitans.planktonicus  0.29903573
## 31             Desulfobulbaceae  0.29254146
## 32              JGI 0001001-H03  0.25643981
## 33                 Algoriphagus  0.25481414
## 34                      Sva1033  0.23701807
## 35               Comamonadaceae  0.23133487
## 36              Anaerolineaceae  0.22564370
## 37                 Boseongicola  0.21997164
## 38             Rhodobacteraceae  0.21780302
## 39                       DEV007  0.20403019
## 40                  Gemmataceae  0.20230230
## 41                        OLB12  0.19960325
## 42                      SC-I-84  0.18828257
## 43 Polynucleobacter.acidiphobus  0.18410785
## 44                     Hirschia  0.17910742
## 45             Desulfocapsaceae  0.17710577
## 46             OM60(NOR5) clade  0.17542745
## 47              Sandaracinaceae  0.16849765
## 48          Steroidobacteraceae  0.16643033
## 49               Sutterellaceae  0.16057292
## 50                  KI89A clade  0.15845129
## 51                         oc32  0.15141838
## 52               Unknown Family  0.15141725
## 53                Sulfurifustis  0.13563609
## 54             Nitrospira.lenta  0.13314945
## 55            Xanthobacteraceae  0.12972283
## 56                Luteolibacter  0.12828495
## 57              Microscillaceae  0.12429488
## 58                   Romboutsia  0.12233856
## 59              Ferruginibacter  0.12001499
## 60                 Gemmatimonas  0.10977796
## 61         Sva0996 marine group  0.10960205
## 62          Vicinamibacteraceae  0.09979998
## 63                Blastocatella  0.09861016
## 64                    Lautropia  0.09326635
## 65                 Chryseolinea  0.09002443
## 66                       GOUTA6  0.08954718
## 67                     Eudoraea  0.08811628
## 68       GKS98 freshwater group  0.08571117
## 69                         MND1  0.08568549
## 70               Hyphomicrobium  0.08292067
## 71         Candidatus Nitrotoga  0.07812251
## 72                 Sulfuritalea  0.07784905
## 73           Rhodoluna.lacicola  0.07676456
## 74                   KF-JG30-B3  0.07265650
## 75            Stenotrophobacter  0.07175602
## 76            Hyphomicrobiaceae  0.06950940
## 77                   Reyranella  0.06919873
## 78               Desulfatiglans  0.06747465
## 79         Reyranella.aquatilis  0.06711296
## 80                 Nitrosospira  0.06342258
## 81                Pedomicrobium  0.05506956
## 82                        SWB02  0.05469795
## 83                        SZB85  0.05417345
## 84                  BD1-7 clade  0.05290782
## 85                      Gaiella  0.04710704
## 86                     Gven-F17  0.04627225
## 87              Geothermobacter  0.04587001
## 88                    Ellin6067  0.04006136
## 89                Chthonobacter  0.03142548
## 90       Sva0081 sediment group  0.02972735
## 91            Nitrosomonadaceae  0.02830726
## 92               Desulfosarcina  0.01776697
## [1] "ME Taxa 66"
## [1] "1.18 % Core ASVs in Module WF 8"
## [1] "Relative Abundance of WF Core Taxa in WF 8"
## [1] 0.9622309
## [1] "Relative Abundance of WF 8 in Core Taxa in WF"
## [1] 0.1765058
## [1] "-"
## [1] "Amount of Taxa shared in Core and Bacterioplankton"
## [1] 100
## [1] "Relative Abundance of WF Core Taxa in in Bacterioplankton"
## [1] 37.64352
## [1] "Relative Abundance of WF 8 in Bacterioplankton"
## [1] 2.052157
##############################
#Sums for modules of interest#
##############################
for (Datasetname in names(WGCNA_list)[grepl("OE|GC", names(WGCNA_list))]) {
  #Datasetname <- sub("ps_","", Dataset)
  
  ps <- pslist[[paste("ps", Datasetname, sep="_")]]
  
  if (Datasetname == "OE") {
    Interest_ME <- "0"
  } else if (Datasetname == "GC") {
    Interest_ME <- "7"
  }
    
  SampleSums <- ps %>%
      transform_sample_counts(function(x) {x/sum(x)*100}) %>%
      phyloseq::otu_table() %>%
      as.data.frame() %>%
      rownames_to_column(var = "SampleID") %>%
      dplyr::left_join(SAMDF16S, by = c("SampleID" = "SampleID")) %>%
      dplyr::group_by(SampleID) %>%
      dplyr::summarise(dplyr::across(rownames(tax_table(ps)), mean, na.rm = TRUE)) %>% 
      t() %>%
      as.data.frame() %>%
      `colnames<-`(.[1, ]) %>%
      .[-1, ] %>%
      stats::setNames(paste0("avg_", colnames(.))) %>%
      mutate_all(as.numeric) %>%
      round(., digits = 2) %>%
      rownames_to_column(var="ASV")
  
   SeasonSums <- ps %>%
      transform_sample_counts(function(x) {x/sum(x)*100}) %>%
      phyloseq::otu_table() %>%
      as.data.frame() %>%
      rownames_to_column(var = "SampleID") %>%
      dplyr::left_join(SAMDF16S, by = c("SampleID" = "SampleID")) %>%
      dplyr::group_by(Season) %>%
      dplyr::summarise(dplyr::across(rownames(tax_table(ps)), mean, na.rm = TRUE)) %>% 
      t() %>%
      as.data.frame() %>%
      `colnames<-`(.[1, ]) %>%
      .[-1, ] %>%
      stats::setNames(paste0("avg_", colnames(.))) %>%
      mutate_all(as.numeric) %>%
      round(., digits = 2) %>%
      rownames_to_column(var="ASV")
   
   RepsSums <- ps %>%
      transform_sample_counts(function(x) {x/sum(x)*100}) %>%
      phyloseq::otu_table() %>%
      as.data.frame() %>%
      rownames_to_column(var = "SampleID") %>%
      dplyr::left_join(SAMDF16S, by = c("SampleID" = "SampleID")) %>%
      dplyr::group_by(Replicates) %>%
      dplyr::summarise(dplyr::across(rownames(tax_table(ps)), mean, na.rm = TRUE)) %>% 
      t() %>%
      as.data.frame() %>%
      `colnames<-`(.[1, ]) %>%
      .[-1, ] %>%
      stats::setNames(paste0("avg_", colnames(.))) %>%
      mutate_all(as.numeric) %>%
      round(., digits = 2) %>%
      rownames_to_column(var="ASV")
   
  ################################################
  Interest_ME_dat <- WGCNA_list[[Datasetname]]$SSUWGCNAlist[[paste("SSU", Interest_ME,
                                sep="")]]
  
   print("Seasonal")
   print(SeasonSums  %>%
      dplyr::filter(ASV %in% Interest_ME_dat$ASV) %>%
      dplyr::select(-ASV) %>%
      colSums() %>%
      as.data.frame()
   )
    print("Replicates")
    print(
    RepsSums  %>%
      dplyr::filter(ASV %in% Interest_ME_dat$ASV) %>%
      dplyr::select(-ASV) %>%
      colSums() %>%
      as.data.frame()
    )
    
  
    print("rel ASVmeans per Order in Core")
    print(
      head(Interest_ME_dat %>%
        dplyr::mutate(total_sum_ASVmeans = sum(ASVmeans, na.rm = TRUE)) %>%
        dplyr::group_by(ASV) %>%
        dplyr::mutate(sum_ASVmeans = sum(ASVmeans, na.rm = TRUE)) %>%
        dplyr::summarize(rel_ASVmeans = round(sum(sum_ASVmeans) / sum(total_sum_ASVmeans) * 100,
                                              digits=2))%>%
        dplyr::arrange(desc(rel_ASVmeans)) %>% 
        as.data.frame() %>% 
        left_join(Interest_ME_dat[c("Phylum", "Class", "Order", "Family", "Genus", "Species", "ASV")])
      ))
    
      print("rel ASVmeans per Genus in Core")
      print(
        head(Interest_ME_dat %>%
        dplyr::mutate(total_sum_ASVmeans = sum(ASVmeans, na.rm = TRUE)) %>%
        dplyr::group_by(Genus) %>%
        dplyr::mutate(sum_ASVmeans = sum(ASVmeans, na.rm = TRUE)) %>%
        dplyr::summarize(rel_ASVmeans = round(sum(sum_ASVmeans) / sum(total_sum_ASVmeans) * 100,
                                              digits=2))%>%
        dplyr::arrange(desc(rel_ASVmeans)) %>% 
        as.data.frame() 
      ))
    
    
  #####################
  #Interest in Waterfilter#
  #####################
    WF_ASVmeans <- pslist$ps_WF %>%
      transform_sample_counts(function(x) {x/sum(x)*100}) %>%
      phyloseq::otu_table() %>%
      t() %>%
      as.data.frame() %>%
            dplyr::mutate(ASVmeans = rowMeans(.)) %>%
            dplyr::mutate(ASV = rownames(.)) %>% 
            dplyr::select(ASV, ASVmeans)
  
  print(paste("Amount of Taxa shared", Datasetname, Interest_ME,  "and Bacterioplankton"))
  print(sum(Interest_ME_dat$ASV %in% rownames(t(otu_table(pslist$ps_WF)))/length(Interest_ME_dat$ASV ))*100)
  
  print(paste("Relative Abundance of", Datasetname, Interest_ME,  "in Bacterioplankton"))
  print(sum(WF_ASVmeans[WF_ASVmeans$ASV %in% Interest_ME_dat$ASV,]$ASVmeans))


}
## [1] "Seasonal"
##                   .
## avg_Autumn_21 49.31
## avg_Spring_22 19.98
## avg_Summer_21 20.75
## avg_Winter_22 21.16
## [1] "Replicates"
##                  .
## avg_OEAU21BB 51.81
## avg_OEAU21MG 30.52
## avg_OEAU21ML 55.70
## avg_OEAU21SS 55.72
## avg_OEAU21TW 52.76
## avg_OESP22BB 15.44
## avg_OESP22MG 23.03
## avg_OESP22ML 22.75
## avg_OESP22SS 17.97
## avg_OESP22TW 20.55
## avg_OESU21BB 17.00
## avg_OESU21MG 16.85
## avg_OESU21ML 29.97
## avg_OESU21SS 23.04
## avg_OESU21TW 19.19
## avg_OEWI22BB 22.16
## avg_OEWI22MG 24.00
## avg_OEWI22ML 22.43
## avg_OEWI22SS 19.43
## avg_OEWI22TW 18.48
## [1] "rel ASVmeans per Order in Core"
## Joining with `by = join_by(ASV)`
##                                ASV rel_ASVmeans            Phylum
## 1              ASV11:Luteolibacter         8.39 Verrucomicrobiota
## 2             ASV16:Asinibacterium         4.45      Bacteroidota
## 3 ASV9:Clostridium sensu stricto 1         3.53        Firmicutes
## 4             ASV19:Endozoicomonas         3.48    Proteobacteria
## 5                ASV30:Deinococcus         2.26      Deinococcota
## 6              ASV33:Mesorhizobium         1.81    Proteobacteria
##                 Class              Order              Family
## 1    Verrucomicrobiae Verrucomicrobiales      Rubritaleaceae
## 2         Bacteroidia    Chitinophagales    Chitinophagaceae
## 3          Clostridia      Clostridiales      Clostridiaceae
## 4 Gammaproteobacteria    Pseudomonadales Endozoicomonadaceae
## 5          Deinococci      Deinococcales      Deinococcaceae
## 6 Alphaproteobacteria        Rhizobiales        Rhizobiaceae
##                         Genus Species
## 1               Luteolibacter    <NA>
## 2              Asinibacterium    <NA>
## 3 Clostridium sensu stricto 1    <NA>
## 4              Endozoicomonas    <NA>
## 5                 Deinococcus    <NA>
## 6               Mesorhizobium    <NA>
## [1] "rel ASVmeans per Genus in Core"
##                         Genus rel_ASVmeans
## 1                        <NA>        14.43
## 2               Luteolibacter        13.25
## 3               Psychrobacter         7.32
## 4              Asinibacterium         4.53
## 5 Clostridium sensu stricto 1         4.47
## 6                 Deinococcus         4.03
## [1] "Amount of Taxa shared OE 0 and Bacterioplankton"
## [1] 51.43639
## [1] "Relative Abundance of OE 0 in Bacterioplankton"
## [1] 33.79109
## [1] "Seasonal"
##                  .
## avg_Autumn_21 1.04
## avg_Spring_22 1.02
## avg_Summer_21 0.70
## avg_Winter_22 0.21
## [1] "Replicates"
##                 .
## avg_GCAU21BB 1.21
## avg_GCAU21ML 0.24
## avg_GCAU21SS 1.45
## avg_GCAU21TW 1.72
## avg_GCSP22BB 2.56
## avg_GCSP22ML 0.12
## avg_GCSP22SS 0.82
## avg_GCSP22TW 0.11
## avg_GCSU21BB 2.79
## avg_GCSU21ML 0.11
## avg_GCSU21SS 0.52
## avg_GCSU21TW 0.10
## avg_GCWI22BB 0.20
## avg_GCWI22ML 0.19
## avg_GCWI22SS 0.32
## [1] "rel ASVmeans per Order in Core"
## Joining with `by = join_by(ASV)`
##                     ASV rel_ASVmeans            Phylum               Class
## 1    ASV238:Halioglobus         5.24    Proteobacteria Gammaproteobacteria
## 2  ASV289:Ilumatobacter         4.51  Actinobacteriota      Acidimicrobiia
## 3   ASV56:Luteolibacter         4.04 Verrucomicrobiota    Verrucomicrobiae
## 4        ASV425:TRA3-20         4.03    Proteobacteria Gammaproteobacteria
## 5 ASV321:Persicirhabdus         3.71 Verrucomicrobiota    Verrucomicrobiae
## 6        ASV537:TRA3-20         2.78    Proteobacteria Gammaproteobacteria
##                Order             Family          Genus Species
## 1    Pseudomonadales         Halieaceae    Halioglobus    <NA>
## 2     Microtrichales Ilumatobacteraceae  Ilumatobacter    <NA>
## 3 Verrucomicrobiales     Rubritaleaceae  Luteolibacter    <NA>
## 4    Burkholderiales            TRA3-20           <NA>    <NA>
## 5 Verrucomicrobiales     Rubritaleaceae Persicirhabdus    <NA>
## 6    Burkholderiales            TRA3-20           <NA>    <NA>
## [1] "rel ASVmeans per Genus in Core"
##                     Genus rel_ASVmeans
## 1                    <NA>        23.17
## 2             Halioglobus        17.63
## 3          Persicirhabdus         8.01
## 4           Ilumatobacter         6.69
## 5 Candidatus Symbiobacter         4.79
## 6           Luteolibacter         4.66
## [1] "Amount of Taxa shared GC 7 and Bacterioplankton"
## [1] 98.4127
## [1] "Relative Abundance of GC 7 in Bacterioplankton"
## [1] 5.696795
sum(rownames(WGCNA_list[["GC"]]$SSUWGCNAlist[[paste("SSU", 8,
                                sep="")]]) %in% 
rownames(WGCNA_list[["OE"]]$SSUWGCNAlist[[paste("SSU", 6,
                                sep="")]]))/length(rownames(WGCNA_list[["GC"]]$SSUWGCNAlist[[paste("SSU", 8,
                                sep="")]]) )*100
## [1] 100

9.3.6 Barplot Specific Taxa

colored_taxonomy_Fish <- pslist$Optics$colored_taxonomy_Fish

##############
#Network CORE#
##############
TaxLevel <- "ASV"
flipped <- T
for (Dataset in names(pslist)[grepl("ps_GCWF|ps_OEWF", names(pslist))]) {

  require(plyr); require(dplyr); require(ggrepel); require(cowplot); require(phyloseq)

    Datasetname <- sub("ps_", "", Dataset)

    # GroupOfInterest_OE <- na.omit(unique(WGCNA_list[["OE"]]$SSUWGCNAlist[["SSU7"]]$ASV))
    # GroupOfInterest_GC <- na.omit(unique(WGCNA_list[["GC"]]$SSUWGCNAlist[["SSU2"]]$ASV))
    # print(paste(sum(GroupOfInterest_OE %in% GroupOfInterest_GC)/length(GroupOfInterest_OE)*100, "% of OE Module SSU7 are in GC Module SSU2"))
    #
    # GroupOfInterest <- c(GroupOfInterest_OE, GroupOfInterest_GC)
    # #GroupOfInterest <- sub(".*:", "", GroupOfInterest)
    # GroupOfInterest <-unique(GroupOfInterest)

    if (grepl("GC", Datasetname)) {
      MODULE <- "SSU8"
    } else if (grepl("OE", Datasetname)) {
      MODULE <- "SSU6"
    }

    ps <- pslist[[Dataset]]

    GroupOfInterest <- na.omit(unique(WGCNA_list[[sub("WF", "",
                                                      Datasetname)]]$SSUWGCNAlist[[MODULE]]$ASV))

    FILENAME    <- paste(paste(save_name, "Alpha_BarPlot_WF", MODULE, Datasetname,  sep="_"), ".png", sep="")

    if (Datasetname %in% c("GC", "OE")) {
      WIDTH <- 10 + length(sample_names(pslist[[Dataset]])) *0.12
      HEIGHT <- 10 #length(sample_names(pslist[[Dataset]])) *0.08
    } else if (Datasetname %in% c("WF")) {
      WIDTH <-  5 + length(sample_names(pslist[[Dataset]])) *0.12
       HEIGHT <- 10 #length(sample_names(pslist[[Dataset]])) *0.08
    } else {
     WIDTH <-  10 + length(sample_names(pslist[[Dataset]])) *0.12
      HEIGHT <- 10 #length(sample_names(pslist[[Dataset]])) *0.08
    }

    ################################
    #Create Relative Abundance Data#
    ################################
    ps_alpha_barplot <- ps %>%
    #tax_glom(taxrank =  TaxLevel)   %>%
    transform_sample_counts(function(x) {x/sum(x)*100}) %>% # Transform to rel. abundance
    psmelt() %>%                                         # Melt to long format
    #filter(Abundance > 1) %>%                       # Filter out low abundance taxa
    arrange(Genus)        %>%                                # Sort data frame alphabetically by phylum
    dplyr::arrange(desc(Abundance))

    ps_alpha_barplot$ASV <- sub(".*:", "", ps_alpha_barplot$OTU)
    #ps_alpha_barplot$ASV <- sub('\\.', ' ', ps_alpha_barplot$ASV)

    ############################
    #Create TotalAbundance Data#
    ############################
    # phylum_abundance <- ps_alpha_barplot %>%
    #   dplyr::group_by(.data[[TaxLevel]]) %>%
    #   dplyr::summarise(TotalAbundance = sum(Abundance))
    # ordered_levels <- phylum_abundance %>%
    #   dplyr::arrange(desc(TotalAbundance)) %>%
    #   pull(.data[[TaxLevel]])
    #
    # ps_alpha_barplot$Taxa <- factor(ps_alpha_barplot[[TaxLevel]], levels = ordered_levels)

    ps_alpha_barplot$SampleID <- factor(ps_alpha_barplot$SampleID, levels =
                                          SSU_Samples[SSU_Samples %in% sample_names(pslist[[Dataset]])])

    SSU_Samples[SSU_Samples %in% sample_names(pslist[[Dataset]])]

    ##########################
    #Subset for Interest ASVs#
    ##########################
    df <- ps_alpha_barplot #[ps_alpha_barplot$OTU %in% GroupOfInterest,]
    #df <- df[df$OTU %in% GroupOfInterest,]
    df$Abundance[!(df$OTU %in% GroupOfInterest)] <- 0

    phylum_abundance <- df %>%
      dplyr::group_by(.data[[TaxLevel]]) %>%
      dplyr::summarise(TotalAbundance = sum(Abundance))
    ordered_levels <- phylum_abundance %>%
      dplyr::arrange(desc(TotalAbundance)) %>%
      pull(.data[[TaxLevel]])

    df$Taxa <- factor(df[[TaxLevel]], levels = ordered_levels)

    Alpha_Diversity_df <- df
    Alpha_Diversity_df$Colors <- NA
    Alpha_Diversity_df$Reps <-  factor(Alpha_Diversity_df$Reps, levels = RepOrder[RepOrder %in% Alpha_Diversity_df$Reps])

    #################################
    #Update to colored_taxonomy_Fish#
    #################################

    taxa_levels <- c("Genus", "Family", "Order", "Class", "Phylum")
    taxa_level2 <- c("Taxa", "Phylum2")

    Alpha_Diversity_df$Color <- "grey"

  # Loop through each row of Alpha_Diversity_df
  for (i in 1:nrow(Alpha_Diversity_df)) {
    matching_color <- NA  # Initialize matching color to NA
    # Loop through each taxonomic level
    for (level in taxa_levels) {
    matching_row <- colored_taxonomy_Fish[colored_taxonomy_Fish$Taxa %in%
                                            Alpha_Diversity_df[[level]][i], ]
    # If there's a match, update matching_color and exit the loop
    if (nrow(matching_row) > 0) {
      matching_color <- matching_row$Color
      break
    }
  }
  # If there's no match at any level, try matching without numbers
  if (is.na(matching_color)) {
    matching_row <- colored_taxonomy_Fish[gsub("\\d", "", colored_taxonomy_Fish$Phylum2) %in%
                                            Alpha_Diversity_df[["Phylum"]][i], ]
    if (nrow(matching_row) > 0) {
      matching_color <- matching_row$Color
    }
  }
  # Assign the matching color to the corresponding row in Alpha_Diversity_df
  Alpha_Diversity_df$Color[i] <- matching_color
  }

    color_mapping <- setNames(Alpha_Diversity_df$Color, Alpha_Diversity_df$Taxa)
    color_mapping[["Candidatus Megaira"]] <- "purple"
    color_mapping[["Acinetobacter.lwoffii"]] <- "#996600"
    color_mapping[["Acinetobacter.johnsonii"]] <- "#663300"
    color_mapping[["Shewanella"]] <- "#FF3366"
    color_mapping[["Shewanella.baltica"]] <- "#FF0099"
    color_mapping[["Shewanella.putrefaciens"]] <-"#FF0066"
    color_mapping[["Photobacterium"]] <- "#FFF000"
    color_mapping[["Aeromonas"]] <- "#FFCC00"

    levels(Alpha_Diversity_df$Taxa)[!levels(Alpha_Diversity_df$Taxa) %in% head(unique(sub(".*:", "", GroupOfInterest)), 34)  ] <-
      "Other"

    if (flipped == FALSE) {
      if (Datasetname == "WF") {
        p <- ggplot(Alpha_Diversity_df,
          aes(x = SampleID, y = Abundance, fill = factor(Taxa))) +
          geom_bar(stat = "identity") +
          facet_grid(. ~ factor(Season, levels = SeasonOrder), drop = TRUE, scale = "free",
                     space = "free_x")
        COL <- col.Palette$col.Palette.Season[names(col.Palette$col.Palette.Season) %in%
                                              unique(sample_data(ps)$Season)]
      } else {
       p <- ggplot(Alpha_Diversity_df,
        aes(x = SampleID, y = Abundance, fill = factor(Taxa))) +
        geom_bar(stat = "identity") +
        facet_grid(. ~ factor(Reps, levels = RepOrder), drop = TRUE, scale = "free", space = "free_y")
      COL <- col.Palette$col.Palette.Reps[names(col.Palette$col.Palette.Reps) %in%                                    unique(sample_data(ps)$Reps)]
      }
      p <- p +
        atheme +
        ylab("Relative Abundance [%] \n") + xlab("") +
        labs(fill="") +
        scale_fill_manual(values = color_mapping) +
        ylim(0, 60) +

        #ggrepel::geom_text_repel(x = df$SampleID, y = df$Abundance,  aes(label = Labels),
        #inherit.aes = FALSE, min.segment.length = 0,nudge_y = 20,size=3, max.overlaps = Inf) +
      #awhite +
        theme(strip.text.y = element_text(angle = 0))  +
        #theme(axis.text.x=element_blank(),
        #axis.text.y=element_blank(),
        #axis.title.y.left = element_blank(),
        #axis.ticks.y =element_blank()
        #) +
        theme(
         panel.background = element_rect(fill='transparent'), #transparent panel bg
         plot.background = element_rect(fill='transparent', color=NA), #transparent plot bg
         #legend.background = element_rect(fill='transparent'), #transparent legend bg
         #legend.box.background = element_rect(fill='transparent') #transparent legend panel
         ) +
      theme(axis.title.x.bottom =  element_text(color="grey13"),
        strip.text = element_text(color = "black", face= "bold")) +
      theme(
        legend.title=element_text(size=9),
        legend.text=element_text(size=9),
        axis.text.x.bottom = element_text(size= 9, face = "bold", angle = 45, hjust = 1),
        strip.text.y.left = element_text(size = 9,face = "bold"),
        axis.title.y.left = element_text(size=12,face = "bold"))

      g <- ggplot_gtable(ggplot_build(p))
        stripr <- which(grepl('strip-t', g$layout$name))
        fills <- alpha(COL, 0.8)
        k <- 1
        for (iii in stripr) {
        j <- which(grepl('rect', g$grobs[[iii]]$grobs[[1]]$childrenOrder))
        g$grobs[[iii]]$grobs[[1]]$children[[j]]$gp$fill <- fills[k]
        k <- k+1}

        Alpha_Diversity_BarPlot <- cowplot::plot_grid(g, labels = c(""), ncol = 1)
        Alpha_Diversity_BarPlot <- plot_grid(Alpha_Diversity_BarPlot, ncol = 1, rel_heights = c(100))
        ggsave(Alpha_Diversity_BarPlot, filename = FILENAME, path = pathPlots, device='png', dpi=300,
               width = WIDTH, height = 6)

    } else if (flipped == TRUE) {
      # New facet label names for supp variable

      COL <- col.Palette$col.Palette.Reps[names(col.Palette$col.Palette.Reps) %in%                                    unique(sample_data(ps)$Reps)]

      Short.labs <- gsub("[^0-9]", "", Alpha_Diversity_df$Reps)
      names(Short.labs) <- Alpha_Diversity_df$Reps

      p <- ggplot(Alpha_Diversity_df, aes(x = SampleID, y = Abundance, fill = factor(Taxa))) +
            geom_bar(stat = "identity") +
            coord_flip() +
            #facet_grid(rows = vars(Reps), drop = TRUE, scale = "free", space = "free_x", switch = "y")
        facet_grid(rows = vars(Reps), drop = TRUE, scale = "free", space = "free_x", switch = "y",
              labeller = labeller(Reps = Short.labs))

      p <- p +
        ylab("Relative Abundance [%] \n") + xlab("") +
        labs(fill="") + #atheme+
        scale_fill_manual(values = color_mapping) +
        ylim(0, 60) +
            theme(axis.title.y=element_blank(),
            axis.text.y=element_blank(),
            axis.ticks.y=element_blank()) +
        theme(
         panel.background = element_rect(fill='transparent'), #transparent panel bg
         plot.background = element_rect(fill='transparent', color=NA), #transparent plot bg
         ) +
      theme(axis.title.x.bottom =  element_text(color="grey13", size=15, face ="bold"),
        strip.text = element_text(color = "black", face= "bold")) +
      theme(
        legend.title=element_text(size=9),
        legend.text=element_text(size=9),
        axis.text.x.bottom = element_text(size= 15, face = "bold", angle = 45, hjust = 1),
        #strip.text.y.left = element_text(size = 9,face = "bold"),
        #axis.title.y.left = element_text(size=12,face = "bold")
        ) +
        theme(legend.position = "none")

      g <- ggplot_gtable(ggplot_build(p))
        stripr <- which(grepl('strip-l', g$layout$name))
        fills <- alpha(COL, 0.8)
        k <- 1
        for (iii in stripr) {
        j <- which(grepl('rect', g$grobs[[iii]]$grobs[[1]]$childrenOrder))
        g$grobs[[iii]]$grobs[[1]]$children[[j]]$gp$fill <- fills[k]
        k <- k+1}

    Alpha_Diversity_BarPlot <- cowplot::plot_grid(g, labels = c(""), ncol = 1)
    Alpha_Diversity_BarPlot <- plot_grid(Alpha_Diversity_BarPlot, ncol = 1, rel_heights = c(100))
    ggsave(Alpha_Diversity_BarPlot, filename = FILENAME, path = pathPlots, device='png', dpi=300, width = 2.7, height = HEIGHT)

    }
}

##########
#Salinity#
##########
TaxLevel <- "ASV"
flipped <- T
for (Dataset in names(pslist)[grepl("ps_GCWF|ps_OEWF", names(pslist))]) {
  
  require(plyr); require(dplyr); require(ggrepel); require(cowplot); require(phyloseq)
     
    Datasetname <- sub("ps_", "", Dataset)
    
  
    # GroupOfInterest_OE <- na.omit(unique(WGCNA_list[["OE"]]$SSUWGCNAlist[["SSU7"]]$ASV))
    # GroupOfInterest_GC <- na.omit(unique(WGCNA_list[["GC"]]$SSUWGCNAlist[["SSU2"]]$ASV))
    # print(paste(sum(GroupOfInterest_OE %in% GroupOfInterest_GC)/length(GroupOfInterest_OE)*100, "% of OE Module SSU7 are in GC Module SSU2"))
    # 
    # GroupOfInterest <- c(GroupOfInterest_OE, GroupOfInterest_GC)
    # #GroupOfInterest <- sub(".*:", "", GroupOfInterest)  
    # GroupOfInterest <-unique(GroupOfInterest)
    
    if (grepl("GC", Datasetname)) {
      MODULE <- "SSU7"
    } else if (grepl("OE", Datasetname)) {
      MODULE <- "SSU3"
    }
    
    ps <- pslist[[Dataset]] 
    
    GroupOfInterest <- na.omit(unique(WGCNA_list[[sub("WF", "", 
                                                      Datasetname)]]$SSUWGCNAlist[[MODULE]]$ASV))
    

    FILENAME    <- paste(paste(save_name, "Alpha_BarPlot_WF", MODULE, Datasetname,  sep="_"), ".png", sep="")
    
    if (Datasetname %in% c("GC", "OE")) {
      WIDTH <- 10 + length(sample_names(pslist[[Dataset]])) *0.12
      HEIGHT <- 10 #length(sample_names(pslist[[Dataset]])) *0.08
    } else if (Datasetname %in% c("WF")) {
      WIDTH <-  5 + length(sample_names(pslist[[Dataset]])) *0.12
       HEIGHT <- 10 #length(sample_names(pslist[[Dataset]])) *0.08
    } else {
     WIDTH <-  10 + length(sample_names(pslist[[Dataset]])) *0.12
      HEIGHT <- 10 #length(sample_names(pslist[[Dataset]])) *0.08
    }
    
  
    ################################
    #Create Relative Abundance Data#
    ################################
    ps_alpha_barplot <- ps %>%
    #tax_glom(taxrank =  TaxLevel)   %>%  
    transform_sample_counts(function(x) {x/sum(x)*100}) %>% # Transform to rel. abundance
    psmelt() %>%                                         # Melt to long format
    #filter(Abundance > 1) %>%                       # Filter out low abundance taxa
    arrange(Genus)        %>%                                # Sort data frame alphabetically by phylum
    dplyr::arrange(desc(Abundance))
 
  
    ps_alpha_barplot$ASV <- sub(".*:", "", ps_alpha_barplot$OTU)  
    #ps_alpha_barplot$ASV <- sub('\\.', ' ', ps_alpha_barplot$ASV)
    
    ############################
    #Create TotalAbundance Data#
    ############################
    # phylum_abundance <- ps_alpha_barplot %>%
    #   dplyr::group_by(.data[[TaxLevel]]) %>%
    #   dplyr::summarise(TotalAbundance = sum(Abundance))
    # ordered_levels <- phylum_abundance %>%
    #   dplyr::arrange(desc(TotalAbundance)) %>%
    #   pull(.data[[TaxLevel]])
    #   
    # ps_alpha_barplot$Taxa <- factor(ps_alpha_barplot[[TaxLevel]], levels = ordered_levels)

    ps_alpha_barplot$SampleID <- factor(ps_alpha_barplot$SampleID, levels = 
                                          SSU_Samples[SSU_Samples %in% sample_names(pslist[[Dataset]])])
  
    
    SSU_Samples[SSU_Samples %in% sample_names(pslist[[Dataset]])]
    
    ##########################
    #Subset for Interest ASVs#
    ##########################
    df <- ps_alpha_barplot #[ps_alpha_barplot$OTU %in% GroupOfInterest,]
    #df <- df[df$OTU %in% GroupOfInterest,]
    df$Abundance[!(df$OTU %in% GroupOfInterest)] <- 0
    
    phylum_abundance <- df %>%
      dplyr::group_by(.data[[TaxLevel]]) %>%
      dplyr::summarise(TotalAbundance = sum(Abundance))
    ordered_levels <- phylum_abundance %>%
      dplyr::arrange(desc(TotalAbundance)) %>%
      pull(.data[[TaxLevel]])
      
    df$Taxa <- factor(df[[TaxLevel]], levels = ordered_levels)
    
    Alpha_Diversity_df <- df
    Alpha_Diversity_df$Colors <- NA
    Alpha_Diversity_df$Reps <-  factor(Alpha_Diversity_df$Reps, levels = RepOrder[RepOrder %in% Alpha_Diversity_df$Reps])
    
    #################################
    #Update to colored_taxonomy_Fish#
    #################################

    taxa_levels <- c("Genus", "Family", "Order", "Class", "Phylum")
    taxa_level2 <- c("Taxa", "Phylum2")

    Alpha_Diversity_df$Color <- "grey"

  # Loop through each row of Alpha_Diversity_df
  for (i in 1:nrow(Alpha_Diversity_df)) {
    matching_color <- NA  # Initialize matching color to NA
    # Loop through each taxonomic level
    for (level in taxa_levels) {
    matching_row <- colored_taxonomy_Fish[colored_taxonomy_Fish$Taxa %in%
                                            Alpha_Diversity_df[[level]][i], ]
    # If there's a match, update matching_color and exit the loop
    if (nrow(matching_row) > 0) {
      matching_color <- matching_row$Color
      break
    }
  }
  # If there's no match at any level, try matching without numbers
  if (is.na(matching_color)) {
    matching_row <- colored_taxonomy_Fish[gsub("\\d", "", colored_taxonomy_Fish$Phylum2) %in%
                                            Alpha_Diversity_df[["Phylum"]][i], ]
    if (nrow(matching_row) > 0) {
      matching_color <- matching_row$Color
    }
  }
  # Assign the matching color to the corresponding row in Alpha_Diversity_df
  Alpha_Diversity_df$Color[i] <- matching_color
  }
    
    color_mapping <- setNames(Alpha_Diversity_df$Color, Alpha_Diversity_df$Taxa)
    color_mapping[["Candidatus Megaira"]] <- "purple"
    color_mapping[["Acinetobacter.lwoffii"]] <- "#996600"
    color_mapping[["Acinetobacter.johnsonii"]] <- "#663300"
    color_mapping[["Shewanella"]] <- "#FF3366"
    color_mapping[["Shewanella.baltica"]] <- "#FF0099"
    color_mapping[["Shewanella.putrefaciens"]] <-"#FF0066"
    color_mapping[["Photobacterium"]] <- "#FFF000"
    color_mapping[["Aeromonas"]] <- "#FFCC00"
    

    levels(Alpha_Diversity_df$Taxa)[!levels(Alpha_Diversity_df$Taxa) %in% head(unique(sub(".*:", "", GroupOfInterest)), 34)  ] <-
      "Other"

    if (flipped == FALSE) {
      if (Datasetname == "WF") {
        p <- ggplot(Alpha_Diversity_df, 
          aes(x = SampleID, y = Abundance, fill = factor(Taxa))) + 
          geom_bar(stat = "identity") +
          facet_grid(. ~ factor(Season, levels = SeasonOrder), drop = TRUE, scale = "free", 
                     space = "free_x")
        COL <- col.Palette$col.Palette.Season[names(col.Palette$col.Palette.Season) %in%
                                              unique(sample_data(ps)$Season)]
      } else {
       p <- ggplot(Alpha_Diversity_df, 
        aes(x = SampleID, y = Abundance, fill = factor(Taxa))) + 
        geom_bar(stat = "identity") +
        facet_grid(. ~ factor(Reps, levels = RepOrder), drop = TRUE, scale = "free", space = "free_y")
      COL <- col.Palette$col.Palette.Reps[names(col.Palette$col.Palette.Reps) %in%                                    unique(sample_data(ps)$Reps)]
      }
      p <- p + 
        atheme +
        ylab("Relative Abundance [%] \n") + xlab("") + 
        labs(fill="") +
        scale_fill_manual(values = color_mapping) +
        ylim(0, 60) +
  
        #ggrepel::geom_text_repel(x = df$SampleID, y = df$Abundance,  aes(label = Labels),
        #inherit.aes = FALSE, min.segment.length = 0,nudge_y = 20,size=3, max.overlaps = Inf) +
      #awhite + 
        theme(strip.text.y = element_text(angle = 0))  +
        #theme(axis.text.x=element_blank(),
        #axis.text.y=element_blank(),
        #axis.title.y.left = element_blank(),
        #axis.ticks.y =element_blank() 
        #) +
        theme(
         panel.background = element_rect(fill='transparent'), #transparent panel bg
         plot.background = element_rect(fill='transparent', color=NA), #transparent plot bg
         #legend.background = element_rect(fill='transparent'), #transparent legend bg
         #legend.box.background = element_rect(fill='transparent') #transparent legend panel
         ) +
      theme(axis.title.x.bottom =  element_text(color="grey13"), 
        strip.text = element_text(color = "black", face= "bold")) +
      theme(
        legend.title=element_text(size=9), 
        legend.text=element_text(size=9), 
        axis.text.x.bottom = element_text(size= 9, face = "bold", angle = 45, hjust = 1),
        strip.text.y.left = element_text(size = 9,face = "bold"), 
        axis.title.y.left = element_text(size=12,face = "bold"))
      
      g <- ggplot_gtable(ggplot_build(p))
        stripr <- which(grepl('strip-t', g$layout$name))
        fills <- alpha(COL, 0.8)
        k <- 1
        for (iii in stripr) {
        j <- which(grepl('rect', g$grobs[[iii]]$grobs[[1]]$childrenOrder))
        g$grobs[[iii]]$grobs[[1]]$children[[j]]$gp$fill <- fills[k]
        k <- k+1}
    
        Alpha_Diversity_BarPlot <- cowplot::plot_grid(g, labels = c(""), ncol = 1)
        Alpha_Diversity_BarPlot <- plot_grid(Alpha_Diversity_BarPlot, ncol = 1, rel_heights = c(100))
        ggsave(Alpha_Diversity_BarPlot, filename = FILENAME, path = pathPlots, device='png', dpi=300,
               width = WIDTH, height = 6)
        
    } else if (flipped == TRUE) {
      # New facet label names for supp variable
    
      COL <- col.Palette$col.Palette.Reps[names(col.Palette$col.Palette.Reps) %in%                                    unique(sample_data(ps)$Reps)]
      
      Short.labs <- gsub("[^0-9]", "", Alpha_Diversity_df$Reps)
      names(Short.labs) <- Alpha_Diversity_df$Reps

      
      p <- ggplot(Alpha_Diversity_df, aes(x = SampleID, y = Abundance, fill = factor(Taxa))) + 
            geom_bar(stat = "identity") +
            coord_flip() +
            #facet_grid(rows = vars(Reps), drop = TRUE, scale = "free", space = "free_x", switch = "y")
        facet_grid(rows = vars(Reps), drop = TRUE, scale = "free", space = "free_x", switch = "y", 
              labeller = labeller(Reps = Short.labs)) 
      
      p <- p + 
        ylab("Relative Abundance [%] \n") + xlab("") + 
        labs(fill="") + #atheme+
        scale_fill_manual(values = color_mapping) +
        ylim(0, 60) +
            theme(axis.title.y=element_blank(),
            axis.text.y=element_blank(),
            axis.ticks.y=element_blank()) +
        theme(
         panel.background = element_rect(fill='transparent'), #transparent panel bg
         plot.background = element_rect(fill='transparent', color=NA), #transparent plot bg
         ) +
      theme(axis.title.x.bottom =  element_text(color="grey13", size=15, face ="bold"), 
        strip.text = element_text(color = "black", face= "bold")) +
      theme(
        legend.title=element_text(size=9), 
        legend.text=element_text(size=9), 
        axis.text.x.bottom = element_text(size= 15, face = "bold", angle = 45, hjust = 1),
        #strip.text.y.left = element_text(size = 9,face = "bold"), 
        #axis.title.y.left = element_text(size=12,face = "bold")
        ) +
        theme(legend.position = "none")
      
      g <- ggplot_gtable(ggplot_build(p))
        stripr <- which(grepl('strip-l', g$layout$name))
        fills <- alpha(COL, 0.8)
        k <- 1
        for (iii in stripr) {
        j <- which(grepl('rect', g$grobs[[iii]]$grobs[[1]]$childrenOrder))
        g$grobs[[iii]]$grobs[[1]]$children[[j]]$gp$fill <- fills[k]
        k <- k+1}
        
    Alpha_Diversity_BarPlot <- cowplot::plot_grid(g, labels = c(""), ncol = 1)
    Alpha_Diversity_BarPlot <- plot_grid(Alpha_Diversity_BarPlot, ncol = 1, rel_heights = c(100))
    ggsave(Alpha_Diversity_BarPlot, filename = FILENAME, path = pathPlots, device='png', dpi=300, width = 2.7, height = HEIGHT)
    
    }
}

#-

9.4 WGCNA Cytoscape

Excluded for knitting, works fine, just un-# the whole chunk

# #You need to download and Open Cytoscape on your computer for this to work 
# #http://www.bioconductor.org/packages/devel/bioc/vignettes/rWikiPathways/inst/doc/Pathway-Analysis.html
# #https://yulab-smu.top/biomedical-knowledge-mining-book/clusterprofiler-go.html
# library(tidyverse)
# library(dplyr)
# library(plyr)
# library(clusterProfiler)
# library(enrichplot)
# library(ggplot2)
# library("org.Hs.eg.db")
# library("org.Dr.eg.db")
# library("org.Mm.eg.db")
# library(DOSE)
# library(rWikiPathways)
# library(jsonlite)
# library("DOSE")
# library("GO.db")
# library("GSEABase")
# library("dplyr")
# library("tidyr")
# library("stringr")
# library("RColorBrewer")
# library("rWikiPathways")
# library("RCy3")
# 
# ###########
# #Cytoscape#
# ###########
# #list of cytoscape apps to install
# #Open Cytoscape
#  cytoscapePing()
# # #list of app to install Do that once!
# # installApp('WikiPathways')
# # installApp('CyTargetLinker')
# # installApp('stringApp')
# # installApp('enrichmentMap')
# # installApp('autoannotate')
# # installApp('wordcloud')
# # installApp('stringapp')
# # installApp('aMatReader')
# # installApp('clustermaker2')
# # cytoscapeVersionInfo ()
# # installApp('STRINGapp')
# # installApp('aMatReader')
# # installApp('clusterMaker2')
# # The following setting is important, do not omit.
# library(WGCNA)
# options(stringsAsFactors = FALSE);
# for (Datasetname  in names(WGCNA_list)) {
#   
#   omics_data    <- WGCNA_list[[Datasetname]]$omics_data
#   datTraits     <- WGCNA_list[[Datasetname]]$datTraits
#   moduleLabels  <- WGCNA_list[[Datasetname]]$moduleLabels
#   moduleColors  <- WGCNA_list[[Datasetname]]$moduleColors
#   MEs           <- WGCNA_list[[Datasetname]]$MEs
#   network       <- WGCNA_list[[Datasetname]]$network
#   ps            <- WGCNA_list[[Datasetname]]$ps
#   softPower     <- WGCNA_list[[Datasetname]]$softPower
#   GeneAnno      <- WGCNA_list$WF$SSUWGCNAlist
#   TaxAnno       <- data.frame(tax_table(WGCNA_list$WF$ps)) %>% rownames_to_column(var="ASV")
#   
#   RepSums <- ps %>%
#       transform_sample_counts(function(x) {x/sum(x)*100}) %>%
#       phyloseq::otu_table() %>%
#       as.data.frame() %>%
#       rownames_to_column(var = "SampleID") %>%
#       dplyr::left_join(SAMDF16S, by = c("SampleID" = "SampleID")) %>%
#       dplyr::group_by(Replicates) %>%
#       dplyr::summarise(dplyr::across(rownames(tax_table(ps)), mean, na.rm = TRUE)) %>% 
#       t() %>%
#       as.data.frame() %>%
#       `colnames<-`(.[1, ]) %>%
#       .[-1, ] %>%
#       stats::setNames(paste0("avg_", colnames(.))) %>%
#       mutate_all(as.numeric) %>%
#       round(., digits = 2) %>%
#       rownames_to_column(var="ASV") 
#     
#     SeasonSums <- ps %>%
#       transform_sample_counts(function(x) {x/sum(x)*100}) %>%
#       phyloseq::otu_table() %>%
#       as.data.frame() %>%
#       rownames_to_column(var = "SampleID") %>%
#       dplyr::left_join(SAMDF16S, by = c("SampleID" = "SampleID")) %>%
#       dplyr::group_by(Season) %>%
#       dplyr::summarise(dplyr::across(rownames(tax_table(ps)), mean, na.rm = TRUE)) %>% 
#       t() %>%
#       as.data.frame() %>%
#       `colnames<-`(.[1, ]) %>%
#       .[-1, ] %>%
#       stats::setNames(paste0("avg_", colnames(.))) %>%
#       mutate_all(as.numeric) %>%
#       round(., digits = 2) %>%
#       rownames_to_column(var="ASV") 
#     
#   ############################################################################
#   #Create WGCNA Dataframe: Species, ModuleMembership, Correlation to Abiotics#
#   ############################################################################
#   for (i in names(MEs)) {
#      tryCatch({
#     
#     ModuleOfInterst <- paste(sub("ME", "", i))
#   
#     TOM = TOMsimilarityFromExpr(omics_data, power = softPower, nThreads =8);
#     modules = as.numeric(sub("ME", "", i))
# 
#     colors <- labels2colors(modules)
#     genes  <- names(omics_data)
#     inModule <- is.finite(match(moduleColors, colors));
#     ASV <- genes[inModule];
#     modColors <- moduleColors[inModule]
#     Data <- as.data.frame(cbind(ASV, modColors))
#   
#     Data  <- Data %>%  
#        left_join( #Add relative ASVmeans 
#        data.frame(t(phyloseq::otu_table(ps %>%
#         transform_sample_counts(function(x) {x/sum(x)*100})))) %>%
#           dplyr::mutate(ASVmeans = rowMeans(.)) %>%
#           dplyr::mutate(ASV = rownames(.)) %>% 
#           dplyr::select(ASV, ASVmeans)) %>% 
#       left_join(as.data.frame(tax_table(ps)) %>% rownames_to_column(var="ASV"))
#     
#       geneModuleMembership <- cor(omics_data, MEs, use = "p") %>%
#         as.data.frame() %>% #Create ModuleMembership 
#         dplyr::select(paste("ME", ModuleOfInterst, sep="")) %>%
#         setNames(paste0("MM"))  #No Individual naming here for Cytoscape columns beeing equal
#     
#       MMPvalue <- #Creating p.values for Module memberships
#         as.data.frame(corPvalueStudent(as.matrix(geneModuleMembership), length(rownames(MEs)))) %>%
#         dplyr::select(paste("MM", sep="")) %>%
#         setNames(paste0("p.MM")) 
#  
#      #setNames(paste0(sub("MM", "p.MM", names(.)), sep="")) %>%
#      #dplyr::select(paste("p.MM", sep=""))
# 
#     Data <- Data %>%
#        left_join(geneModuleMembership  %>% rownames_to_column(var="ASV")) %>%
#        left_join(MMPvalue  %>% rownames_to_column(var="ASV")) %>%
#        left_join(SeasonSums) %>%
#        left_join(RepSums)
#     #dplyr::arrange(desc(ASVmeans))
#     rownames(Data) <- Data$ASV
#     
#     Data$ASVname <- gsub("^ASV\\d+:", "", Data$ASV)
#     
#     
#     # Select the corresponding Topological Overlap
#     TOM_mod = TOM[inModule, inModule];
#     dimnames(TOM_mod) = list(Data$ASV, Data$ASV)
#     
#     #######################
#     #Filter for Parameters#
#     #######################
#     Data_filt <- Data %>% 
#             filter(abs(MM) > 0.5 & p.MM < 0.05 & ASVmeans > 0.005) 
#     TOM_filt <- TOM_mod[Data_filt$ASV, Data_filt$ASV]
#   # Export the network into edge and node list files Cytoscape can read
#   #https://support.bioconductor.org/p/95965/
#   # My operational answer is to select a threshold that keeps the file size manageable, then use     filtering in Cytoscape to interactively choose a threshold that results in an informative plot. This assumes that Cytoscape is only used for visualization; if you're going to use Cytoscape for further analysis, you should probably set the threshold to zero.
#   #https://www.biostars.org/p/9514423/
# 
#     cyt = exportNetworkToCytoscape(TOM_filt,
#     edgeFile =     paste(paste(file.path(path_Output_16S,"CytoscapeInput_ASV_edges"), paste(modules, Datasetname, sep="_"), sep="_"), "txt", sep ="."),
#     nodeFile =  paste(paste(file.path(path_Output_16S,"CytoscapeInput_ASV_nodes"), paste(modules, Datasetname, sep="_"), sep="_"), "txt", sep ="."),
#     weighted = TRUE,
#     threshold = 0.01, #Doing real Filtering in Cytoscape for Abundance
#     nodeNames = Data_filt$ASV,
#     altNodeNames = Data_filt$ASVname,
#     nodeAttr = Data_filt)
#   }, error=function(e){cat("ERROR :",conditionMessage(e), "\n")})}
# }
# 
# ###########
# #Cytoscape#
# ###########
# #list of cytoscape apps to install
# #Open Cytoscape
# cytoscapePing()
# ##############################
# #Import Networks to Cytoscape#
# ##############################
# library("RCy3")
# cytoscapePing () # make sure cytoscape is open
# cytoscapeVersionInfo ()
# for (Datasetname  in names(WGCNA_list)) {
#   
#   omics_data    <- WGCNA_list[[Datasetname]]$omics_data
#   datTraits     <- WGCNA_list[[Datasetname]]$datTraits
#   moduleLabels  <- WGCNA_list[[Datasetname]]$moduleLabels
#   moduleColors  <- WGCNA_list[[Datasetname]]$moduleColors
#   MEs           <- WGCNA_list[[Datasetname]]$MEs
# 
#   
#   for (i in names(MEs)) {
#       tryCatch({
# 
#     modules = as.numeric(sub("ME", "", i))
#     colors = labels2colors(modules)
# 
#     node <- read.delim(paste(paste(file.path(path_Output_16S,"CytoscapeInput_ASV_nodes"), 
#                                    paste(modules, Datasetname, sep="_"), sep="_"), "txt", sep ="."))
#     
#     edge <- read.delim(paste(paste(file.path(path_Output_16S,"CytoscapeInput_ASV_edges"), 
#                                    paste(modules, Datasetname, sep="_"), sep="_"), "txt", sep ="."))
# 
#     
#     #colnames(node)
#     #colnames(node) <- c("id","altName","node_attributes")
# 
#     node <- node %>% 
#       dplyr::mutate(id = nodeName) %>%
#       dplyr::select(id, everything())
#     
#     edge <- edge %>%
#       rename("fromNode"     = "source") %>%
#       rename("toNode"       =  "target") %>%
#       rename("direction"    = "interaction") 
# 
#     createNetworkFromDataFrames(node, edge, title= paste(modules, Datasetname, sep="_"),
#                               collection=paste("DataFrame", Datasetname, sep="_"))
# 
# 
#     ###########################
#     #Set some style parameters#
#     ###########################
#     #getVisualPropertyNames()
#     setVisualStyle('Sample1')
#     # set up my own style
#     style.name = paste("Style", Datasetname, sep="_")
#     defaults <- list( #NODE_SHAPE="diamond",
#                     NODE_SIZE=10,
#                     NODE_FILL_COLOR ="black",
#                     EDGE_TRANSPARENCY=120
#                     #NODE_LABEL_POSITION="W,E,c,0.00,0.00"
#                     )
#     nodeLabels <- mapVisualProperty('node label','id','p')
#     arrowShapes <- mapVisualProperty('Edge Target Arrow     Shape','interaction','d',c("activates","inhibits","interacts"),c("Arrow","T","None"))
#   
#   edgeWidth <- mapVisualProperty('edge width','weight','p')
#   
#     nodeFills <- mapVisualProperty('node fill color','Phylum','d', names(phylum_colors_Cytoscape), as.character(phylum_colors_Cytoscape))
# 
#   #nodeSize<- mapVisualProperty('NODE_SIZE','avg_Spring_22','c',c(0, 100),c(10, 1000))
#   
#   nodeSize <- mapVisualProperty('NODE_SIZE', paste0("ASVmeans"), 'c', c(0, 100), c(15, 1000))
#   
#   nodeLabelSize <- mapVisualProperty('NODE_LABEL_FONT_SIZE', paste0("ASVmeans"), 'c', c(0, 100), c(15, 100))
# 
#   nodeLabel<-mapVisualProperty('node label','ASVname','p')
#     
#    createVisualStyle(style.name, defaults, list(nodeLabels,arrowShapes, nodeFills, nodeLabel, nodeSize, edgeWidth, nodeLabelSize))
#    
#   setVisualStyle(style.name)
# 
#    }, error=function(e){cat("ERROR :",conditionMessage(e), "\n")})
#   }
# }
# 
# 
# # For Visualization in Figure 3B change to cytoscape and do: 
# # Cytoscape -> Import -> Table -> SSU_Data with Column relMeanASV -> To selected network only 
# #Styles -> LabelSize and LableText continous 
# #https://www.biostars.org/p/192885/#192910
# #https://www.biostars.org/p/454313/
# 
# #https://manual.cytoscape.org/en/stable/Styles.html#tutorial-3-creating-a-new-style-with-a-continuous-mapping
# #https://manual.cytoscape.org/en/stable/Styles.html
# #https://www.biostars.org/p/454866/
# 
# 

9.4.1 Overview network

# #You need to download and Open Cytoscape on your computer for this to work 
# #http://www.bioconductor.org/packages/devel/bioc/vignettes/rWikiPathways/inst/doc/Pathway-Analysis.html
# #https://yulab-smu.top/biomedical-knowledge-mining-book/clusterprofiler-go.html
# 
# ###########
# #Cytoscape#
# ###########
# #list of cytoscape apps to install
# #Open Cytoscape
# cytoscapePing()
# library(WGCNA)
# options(stringsAsFactors = FALSE);
# for (Datasetname  in names(WGCNA_list)) {
#   
#   omics_data    <- WGCNA_list[[Datasetname]]$omics_data
#   datTraits     <- WGCNA_list[[Datasetname]]$datTraits
#   moduleLabels  <- WGCNA_list[[Datasetname]]$moduleLabels
#   moduleColors  <- WGCNA_list[[Datasetname]]$moduleColors
#   MEs           <- WGCNA_list[[Datasetname]]$MEs
#   network       <- WGCNA_list[[Datasetname]]$network
#   ps            <- WGCNA_list[[Datasetname]]$ps
#   softPower     <- WGCNA_list[[Datasetname]]$softPower
#   GeneAnno      <- WGCNA_list$WF$SSUWGCNAlist
#   TaxAnno       <- data.frame(tax_table(WGCNA_list$WF$ps)) %>% rownames_to_column(var="ASV")
#   
#   RepSums <- ps %>%
#       transform_sample_counts(function(x) {x/sum(x)*100}) %>%
#       phyloseq::otu_table() %>%
#       as.data.frame() %>%
#       rownames_to_column(var = "SampleID") %>%
#       dplyr::left_join(SAMDF16S, by = c("SampleID" = "SampleID")) %>%
#       dplyr::group_by(Replicates) %>%
#       dplyr::summarise(dplyr::across(rownames(tax_table(ps)), mean, na.rm = TRUE)) %>% 
#       t() %>%
#       as.data.frame() %>%
#       `colnames<-`(.[1, ]) %>%
#       .[-1, ] %>%
#       stats::setNames(paste0("avg_", colnames(.))) %>%
#       mutate_all(as.numeric) %>%
#       round(., digits = 2) %>%
#       rownames_to_column(var="ASV") 
#     
#     SeasonSums <- ps %>%
#       transform_sample_counts(function(x) {x/sum(x)*100}) %>%
#       phyloseq::otu_table() %>%
#       as.data.frame() %>%
#       rownames_to_column(var = "SampleID") %>%
#       dplyr::left_join(SAMDF16S, by = c("SampleID" = "SampleID")) %>%
#       dplyr::group_by(Season) %>%
#       dplyr::summarise(dplyr::across(rownames(tax_table(ps)), mean, na.rm = TRUE)) %>% 
#       t() %>%
#       as.data.frame() %>%
#       `colnames<-`(.[1, ]) %>%
#       .[-1, ] %>%
#       stats::setNames(paste0("avg_", colnames(.))) %>%
#       mutate_all(as.numeric) %>%
#       round(., digits = 2) %>%
#       rownames_to_column(var="ASV") 
#     
#   ############################################################################
#   #Create WGCNA Dataframe: Species, ModuleMembership, Correlation to Abiotics#
#   ############################################################################
#   
#     TOM = TOMsimilarityFromExpr(omics_data, power = softPower, nThreads =8);
#     modules = as.numeric(sub("ME", "", i))
# 
#     
#     genes  <- names(omics_data)
#     inModule <- is.finite(match(moduleColors, moduleColors));
#     ASV <- genes[inModule];
#     modColors <- moduleColors[inModule]
#     Data <- as.data.frame(cbind(ASV, modColors))
#   
#     Data  <- Data %>%  
#        left_join( #Add relative ASVmeans 
#        data.frame(t(phyloseq::otu_table(ps %>%
#         transform_sample_counts(function(x) {x/sum(x)*100})))) %>%
#           dplyr::mutate(ASVmeans = rowMeans(.)) %>%
#           dplyr::mutate(ASV = rownames(.)) %>% 
#           dplyr::select(ASV, ASVmeans)) %>% 
#       left_join(as.data.frame(tax_table(ps)) %>% rownames_to_column(var="ASV"))
#     
#       geneModuleMembership <- cor(omics_data, MEs, use = "p") %>%
#         as.data.frame() #%>% #Create ModuleMembership 
#         #dplyr::select(paste("ME", ModuleOfInterst, sep="")) %>%
#         #setNames(paste0("MM"))  #No Individual naming here for Cytoscape columns beeing equal
#     
#       MMPvalue <- #Creating p.values for Module memberships
#         as.data.frame(corPvalueStudent(as.matrix(geneModuleMembership), length(rownames(MEs)))) #%>%
#         #dplyr::select(paste("MM", sep="")) %>%
#       names(MMPvalue) <- paste(sub("ME", "p.ME", names(MMPvalue)))
#   
#  
#      #setNames(paste0(sub("MM", "p.MM", names(.)), sep="")) %>%
#      #dplyr::select(paste("p.MM", sep=""))
# 
#     Data <- Data %>%
#        dplyr::left_join(geneModuleMembership  %>% tibble::rownames_to_column(var="ASV")) %>%
#        left_join(MMPvalue  %>% rownames_to_column(var="ASV")) %>%
#        left_join(SeasonSums) %>%
#        left_join(RepSums)
#     #dplyr::arrange(desc(ASVmeans))
#     rownames(Data) <- Data$ASV
#     
#     Data$ASVname <- gsub("^ASV\\d+:", "", Data$ASV)
#     
#     
#     # Select the corresponding Topological Overlap
#     TOM_mod = TOM[inModule, inModule];
#     dimnames(TOM_mod) = list(Data$ASV, Data$ASV)
#     
#     #######################
#     #Filter for Parameters#
#     #######################
#     #Data_filt <- Data %>% 
#     #        filter(abs(MM) > 0.5 & p.MM < 0.05 & ASVmeans > 0.005) 
#     
#     Data_filt <- Data %>%
#       group_by(modColors) %>%
#       top_n(50, ASVmeans)
#     
#     TOM_filt <- TOM_mod[Data_filt$ASV, Data_filt$ASV]
#     
#     
#     
#     
#   # Export the network into edge and node list files Cytoscape can read
#   #https://support.bioconductor.org/p/95965/
#   # My operational answer is to select a threshold that keeps the file size manageable, then use     filtering in Cytoscape to interactively choose a threshold that results in an informative plot. This assumes that Cytoscape is only used for visualization; if you're going to use Cytoscape for further analysis, you should probably set the threshold to zero.
#   #https://www.biostars.org/p/9514423/
# 
#     cyt = exportNetworkToCytoscape(TOM_filt,
#     edgeFile =     paste(paste(file.path(path_Output_16S,"CytoscapeInput_ASV_edges"), paste(Datasetname, sep="_"), sep="_"), "txt", sep ="."),
#     nodeFile =  paste(paste(file.path(path_Output_16S,"CytoscapeInput_ASV_nodes"), paste(Datasetname, sep="_"), sep="_"), "txt", sep ="."),
#     weighted = TRUE,
#     threshold = 0.1, #Doing real Filtering in Cytoscape for Abundance
#     nodeNames = Data_filt$ASV,
#     altNodeNames = Data_filt$ASVname,
#     nodeAttr = Data_filt)
# }
# 
# ###########
# #Cytoscape#
# ###########
# ##############################
# #Import Networks to Cytoscape#
# ##############################
# library("RCy3")
# cytoscapePing () # make sure cytoscape is open
# cytoscapeVersionInfo ()
# for (Datasetname  in names(WGCNA_list)) {
#   
#   omics_data    <- WGCNA_list[[Datasetname]]$omics_data
#   datTraits     <- WGCNA_list[[Datasetname]]$datTraits
#   moduleLabels  <- WGCNA_list[[Datasetname]]$moduleLabels
#   moduleColors  <- WGCNA_list[[Datasetname]]$moduleColors
#   MEs           <- WGCNA_list[[Datasetname]]$MEs
# 
#     modules = as.numeric(sub("ME", "", i))
#     colors = labels2colors(modules)
# 
#     node <- read.delim(paste(paste(file.path(path_Output_16S,"CytoscapeInput_ASV_nodes"), 
#                                    paste(Datasetname, sep="_"), sep="_"), "txt", sep ="."))
#     
#     edge <- read.delim(paste(paste(file.path(path_Output_16S,"CytoscapeInput_ASV_edges"), 
#                                    paste(Datasetname, sep="_"), sep="_"), "txt", sep ="."))
# 
#     
#     #colnames(node)
#     #colnames(node) <- c("id","altName","node_attributes")
# 
#     node <- node %>% 
#       dplyr::mutate(id = nodeName) %>%
#       dplyr::select(id, everything())
# 
#     
#     edge <- edge %>% plyr::rename(c("fromNode" = "source",
#                             "toNode"   =  "target", 
#                             "direction" = "interaction"))
#   
# 
#     createNetworkFromDataFrames(node, edge, title= paste(Datasetname, sep="_"),
#                               collection=paste("DataFrame", Datasetname, sep="_"))
# 
# 
#     ###########################
#     #Set some style parameters#
#     ###########################
#     #getVisualPropertyNames()
#     setVisualStyle('Sample1')
#     # set up my own style
#     style.name = paste("Style_overall", Datasetname, sep="_")
#     defaults <- list( #NODE_SHAPE="diamond",
#                     NODE_SIZE=10,
#                     NODE_FILL_COLOR ="black",
#                     EDGE_TRANSPARENCY=120
#                     #NODE_LABEL_POSITION="W,E,c,0.00,0.00"
#                     )
#     nodeLabels <- mapVisualProperty('node label','id','p')
#     arrowShapes <- mapVisualProperty('Edge Target Arrow     Shape','interaction','d',c("activates","inhibits","interacts"),c("Arrow","T","None"))
#   
#   edgeWidth <- mapVisualProperty('edge width','weight','p')
#   
#     nodeFills <- mapVisualProperty('node fill color','Phylum','d', names(phylum_colors_Cytoscape), as.character(phylum_colors_Cytoscape))
# 
#   #nodeSize<- mapVisualProperty('NODE_SIZE','avg_Spring_22','c',c(0, 100),c(10, 1000))
#   
#   nodeSize <- mapVisualProperty('NODE_SIZE', paste0("ASVmeans"), 'c', c(0, 100), c(15, 1000))
#   
#   nodeLabelSize <- mapVisualProperty('NODE_LABEL_FONT_SIZE', paste0("ASVmeans"), 'c', c(0, 100), c(15, 100))
# 
#   nodeLabel<-mapVisualProperty('node label','ASVname','p')
#     
#    createVisualStyle(style.name, defaults, list(nodeLabels,arrowShapes, nodeFills, nodeLabel, nodeSize, edgeWidth, nodeLabelSize))
#    
#   setVisualStyle(style.name)
# 
#   }
# 
# 
# 
# # For Visualization in Figure 3B change to cytoscape and do: 
# # Cytoscape -> Import -> Table -> SSU_Data with Column relMeanASV -> To selected network only 
# #Styles -> LabelSize and LableText continous 
# #https://www.biostars.org/p/192885/#192910
# #https://www.biostars.org/p/454313/
# 
# #https://manual.cytoscape.org/en/stable/Styles.html#tutorial-3-creating-a-new-style-with-a-continuous-mapping
# #https://manual.cytoscape.org/en/stable/Styles.html
# #https://www.biostars.org/p/454866/

#-

10 Picrust2

PICRUST2 https://github.com/picrust/picrust2/wiki/Full-pipeline-script

How to interpret PICRUSt2 output The most important thing to keep in mind when running PICRUSt2 is that predictions of functional potential are output. This data can be useful for generating hypotheses, but should always be interpreted cautiously especially when focused on a single function or predictions for a single ASV (or OTU). You should read through the key limitations of metagenome inference before proceeding. The final output tables produced by PICRUSt2 are essentially the read depth per ASV multiplied by the predicted function abundances per ASV. The output is calculated slightly differently depending on the file as described below, but that is the basic idea. Accordingly, the final tables are not output in terms of relative abundance and the abundances per sample will sum to different numbers.

There are many possible ways to analyze the output of PICRUSt2, such as rounding the data and using the ALDEx2 R package. However, no matter what approach you use it is important to keep in mind that the data is compositional. This means that it would be inappropriate to run a Wilcoxon test (for example) on the output table without first transforming the data somehow (e.g. to relative abundance, centred-log ratio transformation, etc.).

The Enzyme Commission (EC) [4] has classified all enzymes based on the enzymatic reactions they catalyze. Each enzyme has an EC number, which is a hierarchical number that distinguishes enzymes by the type of reactions they catalyze.

awk ‘{print $1 }’ ps_asv.fna > seqs1.fna sed ‘s/://g’ < seqs1.fna > seqs.fna

The input files should be a FASTA of amplicon sequences variants (ASVs; i.e. your representative sequences, not your raw reads, which is study_seqs.fna below) and a BIOM table of the abundance of each ASV across each sample (study_seqs.biom below). Note that a tab-delimited table with ASV ids as the first column and sample abundances as all subsequent columns will also work.

10.1 extract sequences

10.2 Run Picrust2

Version: 2.3.0_b mamba create -n picrust2 -c bioconda picrust2=2.3.0_b

Not installable on Apple M1 atm also not downgrading to 2.3.0_b because it requires pyhton <2.7 which also does not exist for M1 mamba create -n picrust2 -c bioconda -c conda-forge picrust2=2.5.2

Also on Cluster Biom-table seems to have a problem

10.3 KO to KEGG

10.5 DAA

10.6 Combine

Specific Child Terms

-

11 Source Analysis

11.1 Feast

Does not work atm

https://github.com/cozygene/FEAST https://www.nature.com/articles/s41592-019-0431-x

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10125963/

# library(FEAST)
# metadata <- Load_metadata(metadata_path = "/Users/admin/Downloads/metadata_example_multi.txt")
# otus <- Load_CountMatrix(CountMatrix_path = "/Users/admin/Downloads/otu_example_multi.txt")
# 
# FEAST_output <- FEAST(C = otus, metadata = metadata, different_sources_flag = 1, dir_path = "~/FEAST/Data_files/", outfile="demo")
# 
# FEAST_output <- FEAST(C = otus, metadata = metadata, different_sources_flag = 1)
# 
# devtools::install_github("cozygene/FEAST", ref = "FEAST")

-

12 Final dataset

12.1 Read Final Dataset

#saveRDS(pslist, file.path(path_Output_16S,paste(paste(save_name,  "16S_merge_Filter_ASV_besttax", Date, sep="_"), ".rds", sep="")))

# pslistraw <- readRDS(file.path(path_Output_16S, paste(paste(save_name,"ps_16S_merge_pslistraw", Date, sep="_"), ".rds", sep="")))
# 
# pslist <- readRDS(file.path(path_Output_16S,paste(paste(save_name, "16S_merge_Filter_ASV_besttax", Date, sep="_"), ".rds", sep="")))
# 
# WGCNA_list <- readRDS(file = paste0(file.path(path_Output_16S, paste(paste(save_name, "WGCNA_list", Date, sep="_"), ".rds", sep=""))))

12.2 Update samdf

when necessary

# require(tidyverse)
# pslist <- pslist[!grepl("TSE|clr", names(pslist))]
# names(pslist)
# 
# for (i in names(pslist)[grepl("ps", names(pslist))]){
#     ps <- pslist[[i]]
#     g<- names(pslist[i])
# 
#     A<-SAMDF16S[SAMDF16S$SampleID %in% rownames(sample_data(ps)),]
#     Sample_Data <- as.data.frame(sample_data(ps))
#     A<-left_join(A, Sample_Data[,names(Sample_Data) %in% c("SampleID", "nonchim", "Run", "Input")])
#     rownames(A) <- A$SampleID
# 
#     #PHY <-phy_tree(ps)
#     OTU <- otu_table(ps, taxa_are_rows=FALSE)
#     TAX <- tax_table(ps)
#     pslist[[i]] <- phyloseq(otu_table(OTU, taxa_are_rows=FALSE),
#                sample_data(A), tax_table(TAX)) #, phy_tree(PHY)
#   }
# 
# for (i in names(pslist)[grepl("ps", names(pslist))]){
#   g <- paste(i)
#   a <- length(pslist)
#   ps <- pslist[[i]]
#   ps_clr <- microbiome::transform(ps, "clr")
#   pslist[[a+1]] <- ps_clr
#   names(pslist)[[a+1]] <- paste("clr", sub("ps", "\\1", g), sep="")}
# 
# for (i in names(pslist)[grepl("clr", names(pslist))]){
#   g <- paste(i)
#   a <- length(pslist)
#   clr <- pslist[[i]]
#   TSE<-mia::makeTreeSummarizedExperimentFromPhyloseq(clr)
#   pslist[[a+1]] <- TSE
#   names(pslist)[[a+1]] <- paste("TSEc.l.r", sub("clr", "\\1", g), sep="")}

#-